The most common bets in 19th-century casinos were even-money bets on red or black in Roulette or Trente et Quarante. Many casino gamblers allowed themselves to be persuaded that they could make money for sure in these games by following betting systems such as the d'Alembert. What made these systems so seductive? Part of the answer is that some of the systems, including the d'Alembert, can give bettors a very high probability of winning a small or moderate amount. But there is also a more subtle aspect of the seduction. When the systems do win, their return on investment --- the gain relative to the amount of money the bettor has to take out of their pocket and put on the table to cover their bets --- can be astonishingly high. Systems such as le tiers et le tout, which offer a large gain when they do win rather than a high probability of winning, also typically have a high upside return on investment. In order to understand these high returns on investment, we need to recognize that the denominator --- the amount invested --- is random, as it depends on how successive bets come out.
In this article, we compare some systems on their return on investment and their success in hiding their pitfalls. Systems that provide a moderate gain with a very high probability seem to accomplish this by stopping when they are ahead and more generally by betting less when they are ahead or at least have just won, while betting more when they are behind or have just lost. For historical reasons, we call this martingaling. Among martingales, the d'Alembert seems especially good at making an impressive return on investment quickly, encouraging gamblers' hope that they can use it so gingerly as to avoid the possible large losses, and this may explain why its popularity was so durable.
We also discuss the lessons that this aspect of gambling can have for evaluating success in business and finance and for evaluating the results of statistical testing.
Submitted on 2019-09-30
This paper examines the development of Laplacean practical certainty from 1810, when Laplace proved his central limit theorem, to 1925, when Ronald A. Fisher published his Statistical Methods for Research Workers.
Although Laplace's explanations of the applications of his theorem were accessible to only a few mathematicians, expositions published by Joseph Fourier in 1826 and 1829 made the simplest applications accessible to many statisticians. Fourier suggested an error probability of 1 in 20,000, but statisticians soon used less exigent standards. Abuses, including p-hacking, helped discredit Laplace's theory in France to the extent that it was practically forgotten there by the end of the 19th century, yet it survived elsewhere and served as the starting point for Karl Pearson's biometry.
The probability that a normally distributed random variable is more than three probable errors from its mean is approximately 5%. When Fisher published his Statistical Methods, three probable errors was a common standard for likely significance. Because he wanted to enable research workers to use distributions other than the normal -- the t distributions, for example --- Fisher replaced three probable errors with 5%.
The use of significant after Fisher differs from its use by Pearson before 1920. In Pearson's Biometrika, a significant difference was an observed difference that signified a real difference. Biometrika's authors sometimes said that an observed difference is likely or very likely to be significant, but they never said that it is very significant, and they did not have levels of significance. Significance itself was not a matter of degree.
What might this history teach us about proposals to curtail abuses of statistical testing by changing its current vocabulary (p-value, significance, etc.)? The fact that similar abuses arose before this vocabulary was introduced suggests that more substantive changes are needed.