David and Luigi in jail
I’ll tell you a secret: when I read a statistical statement I often wonder whether it’s a temporal statement or an ensemble statement. Do you do that too? Take this headline for example: "Young black people nine times more likely to be jailed than young white people" from The Guardian. I won’t talk about why this might be, that’s not the point of this post. My point, as usual, is about time versus ensembles. If you read the Guardian article, you’ll find that the headline is an ensemble statement. It’s supposed to convey that the proportion of the ensemble of black people under the age of 18 in the UK who were in jail (in a broad sense) when statistics were collected was nine times higher (0.09%) than the corresponding figure for white people (0.01%).
It would be a temporal statement if it meant that your friend David, who is black, spent around 142 hours in jail before he turned 18, whereas your friend Luigi, who is white, spent only around 16 hours in jail before he turned 18. It obviously doesn’t mean that, but — perhaps less obviously — it also doesn’t mean that when David was born he was in any meaningful way 9 times more likely than Luigi to be in jail before they turned 18 — what happens along a single life path over time is not reflected by these aggregate figures.
The word "likely" does not specify whether the probabilities it reflects are relative frequencies in an ensemble or in time. Such unspecific language is problematic when only one interpretation is correct. So: people’s experiences of the penal system are best not talked about in probabilistic terms. Let’s generalize this recommendation: we shouldn’t talk about anything in probabilistic terms unless we’re convinced that the time and ensemble interpretations of what we’re saying are equivalent. Nassim Taleb, in his latest book, put it laconically as "no probability without ergodicity."
It’s not just this one example — lots of statistical statements are phrased in probabilistic language, with the implicit (and often false) assumption that ensemble-interpretations and temporal interpretations of that language will be equivalent. That assumption is called the "ergodic hypothesis." In the guardian example, just reading the headline and then wrongly assuming ergodicity can quite easily lead to horrendous misinterpretations, so let’s watch our language, seriously.
Ergodicity and time scales
"What do you want with that baseball bat? I told you I’ll get you your money as time goes to infinity!"
… will not keep the mob off your back for long, even if you’re telling the truth.
The ergodic hypothesis is designed for so-called "fast" systems, meaning for systems where each trajectory (each person) explores all of its possible states (jail or no jail) over time scales that are short compared to the time scale of measurement. In our example, this would be the case if David and Luigi were each thrown in jail twice a month for a few minutes. Since we only care about where they spent the first 18 years of their lives, saying Luigi spent 0.01% of his time in jail would be good enough (if that were true). Of course that’s not true in this example — relax, your friends David and Luigi don’t even know what a jail looks like.
In reality, instead of David and Luigi rotating in and out of jail all the time, there are a small number of people who spend far more than their fair share of time behind bars (the word "fair," as often in a probabilistic context, has various meanings here).
While pondering the fate of David and Luigi, it occurred to me that I should produce an example of an ergodic system — one where it’s ok to switch time and ensemble perspectives, just so we all know what that means. Almost nothing interesting is well modeled as ergodic, so the example will be boring. Here it goes: your brain makes visual measurements on a time scale of about 20 milliseconds — if I switch between two images more slowly than this, you will notice the change. If I switch much faster, your brain starts averaging over time, and you will perceive something constant in time that contains both images. Aside: I don’t claim to know anything about brains, I’m just guessing this time scale because computer screens used to refresh their images at roughly 50Hz (every 20 milliseconds) and seemed to flicker, while faster screens are nicer.
In Fig. 1 I’ve created four gifs, switching between red and blue at increasing frequency. In the first two we can clearly perceive the red and blue states as distinct — the characteristic time scale of the dynamic is slower than that of the measurement (our vision). The third gif flickers a little, but — at least to my slow brain — it seems kind of purple. That’s because the characteristic time scale of the dynamic is now similar to, or has surpassed, that of the measurement. The final gif is just purple — this is just the static color composed of red and blue with equal weight (RGB code 880088), and I’ve marked it 0 seconds because it’s like switching infinitely fast.
Fig.1: switching color between red and blue at different time scales.
For the first two images saying "this is a purple square" leaves out information that’s relevant on the time scale of measurement. If we call the third square "purple" we’re also replacing a dynamic description "it switches every 20 milliseconds between red and blue" with an average (ensemble or time) description. But because our brains are so slow, to us the square is meaningfully both blue and red "simultaneously" and the description "purple" is beginning to capture what we need to know.
Long story short: probabilistic descriptions are dangerous territory. They may be ok for a system where
any single trajectory through time explores everything that might happen and
it does that so fast that, on the time scale we’re interested in, it’s as if everything is happening simultaneously.
For David and Luigi that’s obviously not the case, for the red and blue squares it can be.