What Do Bayesians Guarantee?

Jake Metzger

May 6, 2024

The Trope

A common trope I see from pop-frequentists is that Bayesian inference doesn’t have any (theoretical or practical) guarantees, where frequentist inference does.

As is often the case with bald statement like that 1) the statement is almost certainly false a priori, and 2) that insofar as it could be true, it relies on unstated presumptions or norms. In this case, it’s very much both.

Frequentist Inferential Guarantees - Error Control

Let’s recall that frequentist inference, insofar as we can address it univocally, is typically concerned with “empirical” probabilities, particularly in relation to physical data generating processes or mechanisms. In this sort of inference, it is often hypothesized that there is some true data generating process out there which is described by a statistical model featuring one or more stochastic variables. These variables are hypothesized to have true parameter values describing the stochastic variables. The goal of frequentist inference, then, is often to yield unbiased estimates of these true parameter values.

Furthermore, frequentist inference is often applied to itself in the following sense: The result of frequentist estimation, for example, is itself a stochastic process dependent upon the contingencies in the data analyzed. If the data had been different, the frequentist estimation would have been different. Given a working hypothesis about how the data is generated, the frequentist estimation procedure can account for how they would have estimated if the data would have happened to have been different. By considering other ways the data could have been, under a tentatively-accepted null hypothesis, frequentist procedures can control the rates at which their inference procedures will fail.

For example, when a frequentist inference procedure yields a 95% confidence interval [0.5, 0.75] for a parameter X, they’re not directly saying that there’s a 95% chance that X is in [0.5, 0.75]. Instead, they’re saying that, over all hypothesized ways the data could have turned out, their confidence intervals (each corresponding to one of these alternatives) cover the true parameter 95% of the time; they’re saying that their intervals are 95% ‘reliable’, and their interval happens to be [0.5, 0.75] this time.

Frequentist Coverage at the Shooting Gallery

One way I visualize this is to consider a shooting gallery with a shotgun with decent spray: the gun hits its hidden target (i.e. covers the true parameter) 95% of the time. Of course the idea is that one wants a gun with good precision, so as to reduce collateral damage to the rest of the gallery. That analog here is that a narrower confidence interval is generally more useful in locating the parameter in question, at least at the same alpha level.

Notice that frequentist inference attaches their probability estimates to their inference procedure instead of the parameter in question. That is, they’re happy to tell you about how generally reliable their gun is at hitting targets, but they don’t answer questions about any particular shot.

Hypothesis Testing between Alternatives

One apparent issue with frequentist testing is that while it can control the empirical error rate of its procedures (given that certain assumptions are true), it yields some apparently pathological results in determining which amongst competing theories should be accepted.

That is, if our live hypotheses consist of H1 and H2, and a frequentist test rejects H1 at the desired level of significance (typically 95%), then this does not mean that we should accept H2. From a frequentist perspective, H1 failing to be accepted is not any reason in favor of H2, despite the fact that H1 and H2 are mutually exclusive. A separate test should be conducted to test H2, and we very well may end up in a situation where both hypotheses get rejected, despite knowing that one of the two is the case. And the situation is worse if there are more alternatives to consider, since the more tests one conducts, the more adjustments one must make to the testing procedure, at the risk of inflating the desired error rate.

Unpacking the frequentist gloss, though, this can seem less pathological. After all, they’re testing the compatibility of the hypotheses with the data and only want to reject hypotheses that are strongly incompatible with the data they’ve seen. Certainly data can be so bad that it’s apparently incompatible with both hypotheses, regardless of their logical relationship. In practice, you’d probably want to go out and get better data, then. But this cannot always be done.

To many, this still seems quite strange. After all, we’re not really interested in just filtering out bad or data-incompatible hypotheses – we want to know which hypothesis or hypotheses are, in some sense, made most likely or most promising given the data we’ve been given. The frequentist is right in the sense that filtering out H1 as a bad hypothesis doesn’t mean that H2 is a good one, but surely the logical relation between H1 and H2 has something to say here about what we should infer.

Bayesian Inferential Guarantees - Consistency+

In contrast to the above, Bayesian analysis does not tie probabilities to empirical frequencies or error rates, though it can certainly be informed by those things. Rather, Bayesian analysis more broadly considers the information available in a given state.

Bayesian Consistency at the Betting Window

One guarantee that Bayesians have over alternatives is probabilistic consistency. That is, its accounting of evidence between hypotheses always forms a valid probability distribution. We’ve seen in the above section that frequentism does not value this. Recall that frequentism can reject H1 without accepting H2 despite H1 and H2 being mutually exclusive. On a Bayesian view, if H1 and H2 are mutually exclusive hypotheses, evidence against H1 counts in favor of H2, and vice versa.

This consistency shows up in so-called Dutch Book arguments [3], hypothetical situations which show the practical irrationality of departure from Bayesian probabilism. Much ink has been spilled over the topic of Dutch Books, but it’s been shown multiple times over that if you fail to be probabilistically consistent in your inferences, you’re subject to a sure-loss Dutch Book [1]. It’s been further shown, through converse Dutch Books, that Bayesian consistency insulates one from such sure losses [3].

Frequentists may or may not care about the purported irrationality of being subject to a Dutch Book, but this is a genuine guarantee provided by Bayesianism. I would submit that it is a minimum requirement of rationality that one not be able to set up a betting scenario that leads to sure loss on your part. That is, it’s irrational to be an exploitable patsy. However if you disagree, I have the game for you, and, if you did your reading, you already know the rules…

Objective Bayesian Guarantees, in general

Since I generally follow Jon Williamson in his non-standard Bayesianism (as of late, anyway), it would be remiss for me to leave out the additional guarantees provided by his brand of Objective Bayesianism. Recall that Objective Bayesianism has three governing norms:

Probabilism: Rational belief is subject to the laws of probability.

That is, non-negativity, unity, and countable additivity.

Calibration: Rational belief calibrated to empirical probabilities, where they exist.

That is, rational belief should be constrained to those possibilities that lie within the convex hull of points compatible with empirical chance (and any other physical, ontological, logical, etc, constraints).

Equivocation: Rational belief is otherwise equivocal between basic alternatives.

That is, our inference should make minimal commitments otherwise.

As pointed out above, probabilism avoids Dutch Books, which are situations with the possibility of sure loss. Williamson points out that Calibration avoids positive expected loss and highly probable long-run loss. He also points out that, at least under certain conditions, Equivocation minimizes worst-case expected loss. Indeed I recommend closely reading chapter 3 of his book [1].

So, far from having no guarantees, Bayesianism offers significant guarantees on its inferences and well-known proofs that only Bayesianism can provide guarantees against sure loss as arise in Dutch Books. It’s another question entirely whether the frequentist can be bothered to care about various sources of loss and various inconsistencies in its information state. Indeed, they can get far with their calibration-only approach. Nonetheless, they can be Dutch Booked while Bayesians cannot – that’s just an inconvenient fact.

Further, Objective Bayesianism, because of its incorporation of frequentist evidence as a part of its inference procedure, cannot, ceteris paribus, perform worse than frequentist inference on average, even according to frequentist criteria such as long-run error control. Without its oft-presumed empirical advantage over Bayesian approaches, frequentism is on significantly weaker ground than Bayesianism, especially in philosophical foundations.

Closing Comparison

One rather fitting comparison between the kinds of guarantees frequentist inference provides and the kinds that Bayesian inference provides is given by a quip from Frank Harrell [2] in a related context:

In poker the goal is to maximize your chance of winning the game. It’s not to minimize your probability of having betted for those games you lost.

This quip is deeper than it looks. First, it notes that we very rarely are concerned with inference for its own sake – we’re often trying to do something with that information. And by doing something with knowledge that we’ve gained from inference, we are taking on risks that are tied to the truth or falsity of the particular facts we’ve inferred. Actions are, in a very real sense, bets on what the truth is. Our inference procedures should be equipped for us to reap the practical benefits of correct inference, not simply control our potential losses. Objective Bayesian inference has a place for empirical error control, but does not agree that that is all inference should be concerned with. That is, Objective Bayesian inference is trying to win the game.

More generally and obviously, the fact that frequentist inference does not actually answer the questions we care about (usually of the form p(theta | X)) is a smell that it is incomplete and that it must be situated within a larger epistemological view. Objective Bayesianism is one such view, and arguably the best such contemporary view. Once situated properly, it’s clear that frequentist fundamentalism was always dead on arrival. [4]

References

[1]: Williamson, Jon. In defence of objective Bayesianism, Oxford University Press, 2010. [2]: https://stats.stackexchange.com/questions/386369/p-value-fisherian-vs-contemporary-frequentist-definitions/632770 [3]: Pettigrew, Richard. Dutch Book Arguments. of Elements in Decision Theory and Philosophy. Cambridge: Cambridge University Press, 2020. [4]: Again, this is not to say that frequentist concepts are not useful. Like I’ve said, I believe they are and that Bayesians should take them into account. It’s largely the frequentist norms and inference procedure that are problematic, aside from the general huff-puffery of their pop-adherents.