# Probability@probability

## Highlighted Articles

Submitted on 2020-09-27

An Excel spreadsheet programmed in VBA is presented that performs certain combinatorial calculations to determine the house or player advantage on a number of common baccarat wagers. The spreadsheet allows the user to input any number of decks and a custom number of cards of each rank. The spreadsheet allows for real-time tracking of the house or player advantage.

Submitted on 2019-09-30

Whether the predictions put forth prior to the 2016 U.S. presidential election were right or wrong is a question that led to much debate. But rather than focusing on right or wrong, we analyze the 2016 predictions with respect to a core set of {\em effectiveness principles}, and conclude that they were ineffective in conveying the uncertainty behind their assessments. Along the way, we extract key insights that will help to avoid, in future elections, the systematic errors that lead to overly precise and overconfident predictions in 2016. Specifically, we highlight shortcomings of the classical interpretations of probability and its communication in the form of predictions, and present an alternative approach with two important features.  First, our recommended predictions are safer in that they come with certain guarantees on the probability of an erroneous prediction; second, our approach easily and naturally reflects the (possibly substantial) uncertainty about the model by outputting plausibilities instead of probabilities.

This note generalizes the notion of conditional probability to Riesz spaces using the order-theoretic approach. With the aid of this concept, we establish the law of total probability and Bayes' theorem in Riesz spaces; we also prove an inclusion-exclusion formula in Riesz spaces. Several examples are provided to show that the law of total probability, Bayes' theorem and inclusion-exclusion formula in probability theory are special cases of our results.

We discuss how sampling design, units, the observation mechanism, and other basic statistical notions figure into modern network data analysis. These con- siderations pose several new challenges that cannot be adequately addressed by merely extending or generalizing classical methods. Such challenges stem from fundamental differences between the domains in which network data emerge and those for which classical tools were developed. By revisiting these basic statistical considerations, we suggest a framework in which to develop theory and methods for network analysis in a way that accounts for both conceptual and practical chal- lenges of network science. We then discuss how some well known model classes fit within this framework.

Submitted on 2020-01-29

The book investigates the misapplication of conventional statistical techniques to fat tailed distributions and looks for remedies, when possible.
Switching from thin tailed to fat tailed distributions requires more than "changing the color of the dress". Traditional asymptotics deal mainly with either n=1 or n=, and the real world is in between, under of the "laws of the medium numbers" --which vary widely across specific distributions. Both the law of large numbers and the generalized central limit mechanisms operate in highly idiosyncratic ways outside the standard Gaussian or Levy-Stable basins of convergence.
A few examples:
+ The sample mean is rarely in line with the population mean, with effect on "naive empiricism", but can be sometimes be estimated via parametric methods.
+ The "empirical distribution" is rarely empirical.
+ Parameter uncertainty has compounding effects on statistical metrics.
+ Dimension reduction (principal components) fails.
+ Inequality estimators (GINI or quantile contributions) are not additive and produce wrong results.
+ Many "biases" found in psychology become entirely rational under more sophisticated probability distributions
+ Most of the failures of financial economics, econometrics, and behavioral economics can be attributed to using the wrong distributions.
This book, the first volume of the Technical Incerto, weaves a narrative around published journal articles.

Submitted on 2020-02-15

An important question in economics is how people choose when facing uncertainty in the timing of rewards. In this paper we study preferences over time lotteries, in which the payment amount is certain but the payment time is uncertain. In expected discounted utility (EDU) theory decision makers must be risk-seeking over time lotteries. Here we explore growth-optimality, a normative model consistent with standard axioms of choice, in which decision makers maximise the growth rate of their wealth. Growth-optimality is consistent with both risk-seeking and risk-neutral behaviour in time lotteries, depending on how growth rates are computed. We discuss two approaches to compute a growth rate: the ensemble approach and the time approach. Revisiting existing experimental evidence on risk preferences in time lotteries, we find that the time approach accords better with the evidence than the ensemble approach. Surprisingly, in contrast to the EDU prediction, the higher the ensemble-average growth rate of a time lottery is, the less attractive it becomes compared to a sure alternative. Decision makers thus may not consider the ensemble-average growth rate as a relevant criterion for their choices. Instead, the time-average growth rate may be a better criterion for decision-making.

Submitted on 2020-03-19

Empirical distributions have their in-sample maxima as natural censoring. We look at the "hidden tail", that is, the part of the distribution in excess of the maximum for a sample size of n. Using extreme value theory, we examine the properties of the hidden tail and calculate its moments of order p.

The method is useful in showing how large a bias one can expect, for a given n, between the visible in-sample mean and the true statistical mean (or higher moments), which is considerable for α close to 1.

Among other properties, we note that the "hidden" moment of order 0, that is, the exceedance probabil- ity for power law distributions, follows an exponential distribution and has for expectation 1/n regardless of the parametrization of the scale and tail index.

Submitted on 2020-03-25

Abstract—Using methods from extreme value theory, we examine the major pandemics in history, trying to understand their tail properties.

Applying the shadow distribution approach developed by the authors for violent conflicts [5], we provide rough estimates for quantities not immediately observable in the data.

Epidemics and pandemics are extremely heavy-tailed, with a potential existential risk for humanity. This property should override conclusions derived from local epidemiological models in what relates to tail events.

Submitted on 2020-05-14

The spread of infectious disease in a human community or the proliferation of fake news on social media can be modeled as a randomly growing tree-shaped graph. The history of the random growth process is often unobserved but contains important information such as thesource of the infection. We consider the problem of statistical inference on aspects of the latent history using only a single snapshot of the final tree. Our approach is to apply random labels to the observed unlabeled tree and analyze the resulting distribution of the growth process, conditional on the final outcome. We show that this conditional distribution is tractable under a shape-exchangeability condition, which we introduce here, and that this condition is satisfied for many popular models for randomly growing trees such as uniform attachment, linear preferential attachment and uniform attachment on a D-regular tree. For inference of the rootunder shape-exchangeability, we propose computationally scalable algorithms for constructing confidence sets with valid frequentist coverage as well as bounds on the expected size of the confidence sets. We also provide efficient sampling algorithms which extend our methods to a wide class of inference problems.

Submitted on 2020-08-14

The inferential model (IM) framework produces data-dependent, non-additive degrees of belief about the unknown parameter that are provably valid. The validity property guarantees, among other things, that inference procedures derived from the IM control frequentist error rates at the nominal level. A technical complication is that IMs are built on a relatively unfamiliar theory of random sets. Here we develop an alternative---and practically equivalent---formulation, based on a theory of possibility measures, which is simpler in many respects. This new perspective also sheds light on the relationship between IMs and Fisher's fiducial inference, as well as on the construction of optimal IMs.

Submitted on 2020-08-23

The most common bets in 19th-century casinos were even-money bets on red or black in Roulette or Trente et Quarante. Many casino gamblers allowed themselves to be persuaded that they could make money for sure in these games by following betting systems such as the d'Alembert. What made these systems so seductive? Part of the answer is that some of the systems, including the d'Alembert, can give bettors a very high probability of winning a small or moderate amount. But there is also a more subtle aspect of the seduction. When the systems do win, their return on investment --- the gain relative to the amount of money the bettor has to take out of their pocket and put on the table to cover their bets --- can be astonishingly high. Systems such as le tiers et le tout, which offer a large gain when they do win rather than a high probability of winning, also typically have a high upside return on investment. In order to understand these high returns on investment, we need to recognize that the denominator --- the amount invested --- is random, as it depends on how successive bets come out.

In this article, we compare some systems on their return on investment and their success in hiding their pitfalls. Systems that provide a moderate gain with a very high probability seem to accomplish this by stopping when they are ahead and more generally by betting less when they are ahead or at least have just won, while betting more when they are behind or have just lost. For historical reasons, we call this martingaling. Among martingales, the d'Alembert seems especially good at making an impressive return on investment quickly, encouraging gamblers' hope that they can use it so gingerly as to avoid the possible large losses, and this may explain why its popularity was so durable.

We also discuss the lessons that this aspect of gambling can have for evaluating success in business and finance and for evaluating the results of statistical testing.