When the Numbers Lie

Simpson’s Paradox and the Hidden Structure of Data
Causality
Statistics

“Together, one story. Apart, different ones.” | Road in Alentejo, Portugal


TL;DR

A drug can help men. It can help women. And yet help nobody. That’s Simpson’s Paradox: the statistical trap where trends in subgroups vanish or reverse when you combine the data. It’s not a math error; it’s a warning that numbers don’t interpret themselves. The takeaway: never trust an aggregate statistic without asking what’s hiding underneath.

The Full Story

As Portugal counts down to its presidential election in January 2026, I’ve found myself unable to escape the chorus of political commentators dissecting poll after poll. This election is unprecedented: 11 candidates running for the presidency, though the ballot will somehow list 14 names (a peculiarity that deserves its own post entirely in another type of blog). But what’s been nagging at me isn’t the crowded field. It’s something the analysts keep repeating like a mantra: “The numbers don’t lie.” They acknowledge, almost dismissively, that different polls yield different results: due to different timing, different sample sizes, slightly different methodologies, but they always circle back to that reassuring phrase: “The numbers don’t lie”. And every time I hear it, something in my brain twitches. Because while it’s true that numbers don’t lie, they also don’t speak for themselves. They can tell contradictory truths simultaneously. They can show one thing in the whole and the opposite in the parts. This is what kept popping into my mind as I watched yet another commentator gesture at bar charts: Simpson’s Paradox.

This is a statistical phenomenon that, once you understand it, will forever change how you look at data. In fact, the Simpson’s Paradox, reveals something profound about the relationship between statistics, causation, and the stories we tell ourselves about evidence.

The Paradox in Action

Imagine a clinical trial for a new drug. You analyze the results and find:

  • Among men: the drug works better than placebo (61% vs 57% recovery)
  • Among women: the drug works better than placebo (44% vs 40% recovery)

So the drug helps everyone, right? Time to approve it?

Here’s the twist: when you pool all the data together, the drug shows no effect whatsoever: exactly 50% recovery in both the treatment and control groups.

This isn’t a mathematical error. It’s Simpson’s Paradox: an association that holds in every subgroup can vanish - or even reverse - when you combine the data.

How Is This Possible?

The key lies in how people ended up in each group. In our example, women were more likely to receive the treatment (27 of 40 treated patients were women), while men dominated the control group. Since men recover at higher rates regardless of treatment, this imbalance masks the drug’s genuine effect.

The aggregate statistics aren’t lying, exactly. They’re just answering a different question than the one we think we’re asking.

Why This Matters Beyond Statistics

What fascinates me about Simpson’s Paradox isn’t the mathematics (that part is actually straightforward once you see it). What’s fascinating is what it reveals about human reasoning.

Judea Pearl (see my blog post The book of why), the computer scientist who revolutionized causal inference, argues that the paradox feels paradoxical because we instinctively conflate two very different things:

  1. Observational claims: “People who take this drug are more likely to recover.”
  2. Causal claims: “Taking this drug will make you more likely to recover.”

These sound similar but aren’t the same. The first describes a correlation in existing data. The second describes what would happen if we intervened - if we actually gave the drug to someone.

Pearl’s insight is that if we interpret the conditional probabilities causally (as describing what happens when we intervene), then Simpson-style reversals become impossible, provided the partitioning variable (like gender) isn’t itself affected by the treatment. Our intuition isn’t wrong - it’s just operating on causal assumptions that the raw statistics don’t warrant.

The Real-World Stakes

This isn’t just academic hairsplitting. Simpson’s Paradox has appeared in:

  • COVID-19 mortality data: Early in the pandemic, Italy’s overall case fatality rate exceeded China’s, yet within every age group, China’s rate was higher. The difference? Italy had an older population.

  • University admissions: Berkeley’s famous 1973 study showed men admitted at higher rates overall, yet no individual department favored men. Women simply applied disproportionately to more competitive departments.

  • Educational testing: SAT scores rose in US between 1992 and 2002, yet scores fell within every grade-point-average category. Grade inflation meant each category was losing its best students to the tier above.

In each case, the aggregate statistic tells a true but misleading story. The subgroup statistics tell a different true story. Neither is wrong - they are just answering different questions.

The Deeper Lesson

I think Simpson’s Paradox endures as a topic of fascination because it sits at the intersection of mathematics, psychology, and philosophy. It forces us to confront uncomfortable questions:

What are we actually asking when we ask if something “works”?

The drug example shows that “works” is ambiguous. Works for whom? Under what conditions? Compared to what counterfactual?

Can we ever trust aggregate statistics?

Yes, but only if we understand the causal structure generating them. A well-designed randomized trial eliminates the paradox by ensuring treatment assignment is independent of confounding factors. The mathematics doesn’t change; we just control which mathematics applies.

Why does our intuition fail here?

Perhaps because in everyday reasoning, we rarely encounter situations where a variable (like gender) is simultaneously associated with both the cause (treatment assignment) and the effect (recovery). Or perhaps because we’ve evolved to think causally, and the statistical framing tricks us into applying causal intuitions where they don’t belong.

A Humbling Thought

Simpson’s Paradox reminds us that data doesn’t interpret itself. Behind every dataset lies a causal structure, i.e. a web of relationships determining who ends up where and why. Statistics can describe patterns in the data, but they can’t tell us which patterns reflect genuine causal relationships and which are artifacts of how the data were generated.

This is why I find the paradox oddly comforting. It’s a built-in reminder that analysis requires humility. The numbers are never the whole story. The whole story requires understanding the process that produced the numbers, and that understanding doesn’t come from the data alone.

Why Should Non-Scientists Care?

Every day, you’re bombarded with statistics claiming to prove something: that a diet works, that a policy failed, that one group outperforms another. Simpson’s Paradox reveals that these numbers can be simultaneously true and deeply misleading - not through anyone’s dishonesty, but through the hidden structure of how data gets generated. Understanding this paradox is intellectual self-defense. It’s the difference between being manipulated by a cherry-picked aggregate statistic and asking the right question: “But what happens when you break it down by…?” Whether you’re evaluating a medical treatment, interpreting crime statistics, assessing school performance, or deciding whether some intervention actually caused an outcome, the paradox teaches a crucial lesson that extends far beyond mathematics: the story you see depends entirely on how you slice the data, and the “right” way to slice it depends on causal knowledge that the statistics themselves cannot provide. In a world where data is increasingly weaponized to persuade, knowing that associations can reverse - that a drug can help everyone yet help no one, that discrimination can exist in the whole yet vanish in the parts - transforms you from a passive consumer of statistics into someone who knows which questions to ask.

The next time you see a headline claiming that some treatment “doesn’t work” or that some factor “has no effect,” ask yourself: in which subgroups? Under what conditions? And what’s the causal story that generated these statistics in the first place?

The BioLogical Footnote

Perhaps what makes Simpson’s Paradox so persistently unsettling is that it mirrors something we already know about life but prefer to ignore: context isn’t just important, it is constitutive. We want to believe that facts are facts, that a treatment either works or it doesn’t, that a policy either succeeds or fails. But the paradox whispers something more uncomfortable: the same intervention, the same choice, the same action can be simultaneously beneficial and harmful depending on where you stand. This isn’t relativism; the math is precise. It’s something stranger: a reminder that “what happened” is never separable from “to whom” and “under what circumstances.” The paradox doesn’t just challenge how we analyze data. It challenges the very notion that there’s a view from nowhere, a god’s-eye perspective from which the true effect simply is. Maybe the deepest lesson isn’t statistical at all: it’s that the question “does it work?” is incomplete until we finish the sentence.

To Explore Further

Simpson’s paradox | Stanford Encyclopedia of Philosophy

Back to top