Ryan Martin

Ryan Martin

North Carolina State University & ResearchersOne




Professor of Statistics at North Carolina State University and co-founder of ResearchersOne.


C. Cunen, N. Hjort, and T. Schweder published a comment on our paper, Satellite conjunction analysis and the false confidence theorem.  Here is our response to their comment.  

The inferential model (IM) framework produces data-dependent, non-additive degrees of belief about the unknown parameter that are provably valid. The validity property guarantees, among other things, that inference procedures derived from the IM control frequentist error rates at the nominal level. A technical complication is that IMs are built on a relatively unfamiliar theory of random sets. Here we develop an alternative---and practically equivalent---formulation, based on a theory of possibility measures, which is simpler in many respects. This new perspective also sheds light on the relationship between IMs and Fisher's fiducial inference, as well as on the construction of optimal IMs.

Predicting the response at an unobserved location is a fundamental problem in spatial statistics. Given the difficulty in modeling spatial dependence, especially in non-stationary cases, model-based prediction intervals are at risk of misspecification bias that can negatively affect their validity. Here we present a new approach for model-free spatial prediction based on the conformal prediction machinery. Our key observation is that spatial data can be treated as exactly or approximately exchangeable in a wide range of settings. For example, when the spatial locations are deterministic, we prove that the response values are, in a certain sense, locally approximately exchangeable for a broad class of spatial processes, and we develop a local spatial conformal prediction algorithm that yields valid prediction intervals without model assumptions. Numerical examples with both real and simulated data confirm that the proposed conformal prediction intervals are valid and generally more efficient than existing model-based procedures across a range of non-stationary and non-Gaussian settings.

Comment on the proposal to rename the R.A. Fisher Lecture.

A fundamental problem in statistics and machine learning is that of using observed data to predict future observations. This is particularly challenging for model-based approaches because often the goal is to carry out this prediction with no or minimal model assumptions. For example, the inferential model (IM) approach is attractive because it has certain validity guarantees, but requires specification of a parametric model. Here we show that a new perspective on a recently developed generalized IM approach can be applied to construct an IM for prediction that satisfies the desirable validity guarantees without specification of a model. One important special case of this approach corresponds to the powerful conformal prediction framework and, consequently, the desirable properties of conformal prediction follow immediately from the general IM validity theory. Several numerical examples are presented to illustrate the theory and highlight the method's performance and flexibility.

Inferential challenges that arise when data are censored have been extensively studied under the classical frameworks. In this paper, we provide an alternative generalized inferential model approach whose output is a data-dependent plausibility function. This construction is driven by an association between the distribution of the relative likelihood function at the interest parameter and an unobserved auxiliary variable. The plausibility function emerges from the distribution of a suitably calibrated random set designed to predict that unobserved auxiliary variable. The evaluation of this plausibility function requires a novel use of the classical Kaplan--Meier estimator to estimate the censoring rather than the event distribution. We prove that the proposed method provides valid inference, at least approximately, and our real- and simulated-data examples demonstrate its superior performance compared to existing methods.

Bias resulting from model misspecification is a concern when predicting insurance claims. Indeed, this bias puts the insurer at risk of making invalid or unreliable predictions. A method that could provide provably valid predictions uniformly across a large class of possible distributions would effectively eliminate the risk of model misspecification bias. Conformal prediction is one such method that can meet this need, and here we tailor that approach to the typical insurance application and show that the predictions are not only valid but also efficient across a wide range of settings.

Whether the predictions put forth prior to the 2016 U.S. presidential election were right or wrong is a question that led to much debate. But rather than focusing on right or wrong, we analyze the 2016 predictions with respect to a core set of {\em effectiveness principles}, and conclude that they were ineffective in conveying the uncertainty behind their assessments. Along the way, we extract key insights that will help to avoid, in future elections, the systematic errors that lead to overly precise and overconfident predictions in 2016. Specifically, we highlight shortcomings of the classical interpretations of probability and its communication in the form of predictions, and present an alternative approach with two important features.  First, our recommended predictions are safer in that they come with certain guarantees on the probability of an erroneous prediction; second, our approach easily and naturally reflects the (possibly substantial) uncertainty about the model by outputting plausibilities instead of probabilities.

Meta-analysis based on only a few studies remains a challenging problem, as an accurate estimate of the between-study variance is apparently needed, but hard to attain, within this setting. Here we offer a new approach, based on the generalized inferential model framework, whose success lays in marginalizing out the between-study variance, so that an accurate estimate is not essential. We show theoretically that the proposed solution is at least approximately valid, with numerical results suggesting it is, in fact, nearly exact. We also demonstrate that the proposed solution outperforms existing methods across a wide range of scenarios.

In the context of predicting future claims, a fully Bayesian analysis---one that specifies a statistical model, prior distribution, and updates using Bayes's formula---is often viewed as the gold-standard, while Buhlmann's credibility estimator serves as a simple approximation. But those desirable properties that give the Bayesian solution its elevated status depend critically on the posited model being correctly specified. Here we investigate the asymptotic behavior of Bayesian posterior distributions under a misspecified model, and our conclusion is that misspecification bias generally has damaging effects that can lead to inaccurate inference and prediction. The credibility estimator, on the other hand, is not sensitive at all to model misspecification, giving it an advantage over the Bayesian solution in those practically relevant cases where the model is uncertain. This begs the question: does robustness to model misspecification require that we abandon uncertainty quantification based on a posterior distribution? Our answer to this question is No, and we offer an alternative Gibbs posterior construction. Furthermore, we argue that this Gibbs perspective provides a new characterization of Buhlmann's credibility estimator.

An inferential model encodes the data analyst's degrees of belief about an unknown quantity of interest based on the observed data, posited statistical model, etc. Inferences drawn based on these degrees of belief should be reliable in a certain sense, so we require the inferential model to be valid. The construction of valid inferential models based on individual pieces of data is relatively straightforward, but how to combine these so that the validity property is preserved? In this paper we analyze some common combination rules with respect to this question, and we conclude that the best strategy currently available is one that combines via a certain dimension reduction step before the inferential model construction.

In this paper we adopt the familiar sparse, high-dimensional linear regression model and focus on the important but often overlooked task of prediction. In particular, we consider a new empirical Bayes framework that incorporates data in the prior in two ways: one is to center the prior for the non-zero regression coefficients and the other is to provide some additional regularization. We show that, in certain settings, the asymptotic concentration of the proposed empirical Bayes posterior predictive distribution is very fast, and we establish a Bernstein--von Mises theorem which ensures that the derived empirical Bayes prediction intervals achieve the targeted frequentist coverage probability. The empirical prior has a convenient conjugate form, so posterior computations are relatively simple and fast. Finally, our numerical results demonstrate the proposed method's strong finite-sample performance in terms of prediction accuracy, uncertainty quantification, and computation time compared to existing Bayesian methods.

Statistics has made tremendous advances since the times of Fisher, Neyman, Jeffreys, and others, but the fundamental and practically relevant questions about probability and inference that puzzled our founding fathers remain unanswered. To bridge this gap, I propose to look beyond the two dominating schools of thought and ask the following three questions: what do scientists need out of statistics, do the existing frameworks meet these needs, and, if not, how to fill the void? To the first question, I contend that scientists seek to convert their data, posited statistical model, etc., into calibrated degrees of belief about quantities of interest. To the second question, I argue that any framework that returns additive beliefs, i.e., probabilities, necessarily suffers from false confidence---certain false hypotheses tend to be assigned high probability---and, therefore, risks systematic bias. This reveals the fundamental importance of non-additive beliefs in the context of statistical inference. But non-additivity alone is not enough so, to the third question, I offer a sufficient condition, called validity, for avoiding false confidence, and present a framework, based on random sets and belief functions, that provably meets this condition. Finally, I discuss characterizations of p-values and confidence intervals in terms of valid non-additive beliefs, which imply that users of these classical procedures are already following the proposed framework without knowing it.

Nonparametric estimation of a mixing density based on observations from the corresponding mixture is a challenging statistical problem. This paper surveys the literature on a fast, recursive estimator based on the predictive recursion algorithm. After introducing the algorithm and giving a few examples, I summarize the available asymptotic convergence theory, describe an important semiparametric extension, and highlight two interesting applications. I conclude with a discussion of several recent developments in this area and some open problems.

Bayesian methods provide a natural means for uncertainty quantification, that is, credible sets can be easily obtained from the posterior distribution. But is this uncertainty quantification valid in the sense that the posterior credible sets attain the nominal frequentist coverage probability? This paper investigates the frequentist validity of posterior uncertainty quantification based on a class of empirical priors in the sparse normal mean model. In particular, we show that our marginal posterior credible intervals achieve the nominal frequentist coverage probability under conditions slightly weaker than needed for selection consistency and a Bernstein--von Mises theorem for the full posterior, and numerical investigations suggest that our empirical Bayes method has superior frequentist coverage probability properties compared to other fully Bayes methods.

This article describes our motivation behind the development of RESEARCHERS.ONE, our mission, and how the new platform will fulfull this mission.  We also compare our approach with other recent reform initiatives such as post-publication peer review and open access publications.  

This article describes how the filtering role played by peer review may actually be harmful rather than helpful to the quality of the scientific literature. We argue that, instead of trying to filter out the low-quality research, as is done by traditional journals, a better strategy is to let everything through but with an acknowledgment of the uncertain quality of what is published, as is done on the RESEARCHERS.ONE platform.  We refer to this as "scholarly mithridatism."  When researchers approach what they read with doubt rather than blind trust, they are more likely to identify errors, which protects the scientific community from the dangerous effects of error propagation, making the literature stronger rather than more fragile.  

In a Bayesian context, prior specification for inference on monotone densities is conceptually straightforward, but proving posterior convergence theorems is complicated by the fact that desirable prior concentration properties often are not satisfied. In this paper, I first develop a new prior designed specifically to satisfy an empirical version of the prior concentration property, and then I give sufficient conditions on the prior inputs such that the corresponding empirical Bayes posterior concentrates around the true monotone density at nearly the optimal minimax rate. Numerical illustrations also reveal the practical benefits of the proposed empirical Bayes approach compared to Dirichlet process mixtures.

Accurate estimation of value-at-risk (VaR) and assessment of associated uncertainty is crucial for both insurers and regulators, particularly in Europe. Existing approaches link data and VaR indirectly by first linking data to the parameter of a probability model, and then expressing VaR as a function of that parameter. This indirect approach exposes the insurer to model misspecification bias or estimation inefficiency, depending on whether the parameter is finite- or infinite-dimensional. In this paper, we link data and VaR directly via what we call a discrepancy function, and this leads naturally to a Gibbs posterior distribution for VaR that does not suffer from the aforementioned biases and inefficiencies. Asymptotic consistency and root-n concentration rate of the Gibbs posterior are established, and simulations highlight its superior finite-sample performance compared to other approaches.

Inference on parameters within a given model is familiar, as is ranking different models for the purpose of selection. Less familiar, however, is the quantification of uncertainty about the models themselves. A Bayesian approach provides a posterior distribution for the model but it comes with no validity guarantees, and, therefore, is only suited for ranking and selection. In this paper, I will present an alternative way to view this model uncertainty problem, through the lens of a valid inferential model based on random sets and non-additive beliefs. Specifically, I will show that valid uncertainty quantification about a model is attainable within this framework in general, and highlight the benefits in a classical signal detection problem.

Publication of scientific research all but requires a supporting statistical analysis, anointing statisticians the de facto gatekeepers of modern scientific discovery. While the potential of statistics for providing scientific insights is undeniable, there is a crisis in the scientific community due to poor statistical practice. Unfortunately, widespread calls to action have not been effective, in part because of statisticians’ tendency to make statistics appear simple. We argue that statistics can meet the needs of science only by empowering scientists to make sound judgments that account for both the nuances of the application and the inherent complexity of funda- mental effective statistical practice. In particular, we emphasize a set of statistical principles that scientists can adapt to their ever-expanding scope of problems.

Confidence is a fundamental concept in statistics, but there is a tendency to misinterpret it as probability. In this paper, I argue that an intuitively and mathematically more appropriate interpretation of confidence is through belief/plausibility functions, in particular, those that satisfy a certain validity property. Given their close connection with confidence, it is natural to ask how a valid belief/plausibility function can be constructed directly. The inferential model (IM) framework provides such a construction, and here I prove a complete-class theorem stating that, for every nominal confidence region, there exists a valid IM whose plausibility regions are contained by the given confidence region. This characterization has implications for statistics understanding and communication, and highlights the importance of belief functions and the IM framework.

© 2018-2020 Researchers.One