Frances Coppola shows where the truth lies.
If there is one industry that has flourished under coronavirus, it is the statistics industry. Never have so many statistics been produced, charted, interpreted and analysed. And never have so many conflicting stories been generated from essentially the same information. By choosing carefully from the statistical smorgasbord, you can justify almost any course of action from complete reopening of the economy to indefinite lockdown.
Not only can you create any story you like, you can twist the statistics any way you like too, secure in the knowledge that most people don’t understand them anyway. You can plot unrelated figures on the same chart, conveniently forget to mention important differences between data sets, and wilfully misinterpret what a chart actually shows. Coronavirus chart crime is rampant.
One of the principal reasons why pandemic statistics are so open to abuse is that people don’t understand exponential mathematics. Early on in the pandemic, there were few infections and even fewer deaths. It was hard to believe that within a few weeks, deaths would be in the thousands and infections in the tens of thousands. And it was not just ordinary people who were fooled by the slow rate of increase in the early part of the pandemic. Governments were, too. The British government, for example, did nothing to stamp on early cases, despite evidence from Italy that failing to do so would result in an out-of-control epidemic. And in the US, Donald Trump famously said “we have 15 cases and it will soon be zero”. How wrong he was.
“You can twist the statistics any way you like too, secure in the knowledge that most people don’t understand them anyway.”
The way in which the figures were explained to the public made matters worse. Scientific gobbledegook proliferated. R-zero (truncated to R in the UK government’s daily updates), in particular, was widely misunderstood. It is the number of people that one infected person will infect in a population where everyone is susceptible to infection, but the last part of this was quickly forgotten. And it was misused by opponents of lockdowns. This chart, for example, by an analyst at J.P. Morgan, purported to show that in most U.S. states, R-zero had fallen after the lockdown was lifted, implying that the lockdowns had been futile (Fig 1).
But this chart doesn’t show what R-zero was before the lockdowns. It only shows us that after the lockdowns were lifted, R-zero was less than 1 in every state. Once R-zero is below 1, it inevitably continues to fall, since every infected person is infecting less than one other person. As the whole point of lockdown is to reduce R-zero to the point where the virus can no longer spread, this chart could therefore equally be taken as evidence that lockdowns work.
As the pandemic gathered strength, statisticians started presenting data on logarithmic scales. Here’s an example from the FT (Fig 2).
Notice the labelling on the y axis. The spacing is not even. Rather, the numerical difference between each label increases as you go higher up the chart.
There was good reason to present the data in this way: on the more familiar linear scale, exponential rates of growth shoot up like rockets, quickly making the charts very large. But to people who didn’t understand log scales, the rates of growth appeared much lower than they actually were. This created a great deal of confusion and made it harder to justify strict virus control measures.
To make matters worse, the UK government started talking about “flattening the curve”. This was a nice catchy phrase with a strong visual image, which presumably the government’s communications gurus thought people would understand better than “stopping the death rate from increasing”. But when you “flatten the curve” on a log scale, the death rate is still rising exponentially, just at a shallower angle. The curve has to be heading downwards for the death rate to have stopped increasing. So at about day 25, people looked at a chart like this and exclaimed, “ooh look, the curve is flattening, the virus is on its way out!” The curve was indeed flattening, but the virus certainly wasn’t on its way out. The death rate was still rising.
Another problem was plotting countries at different stages of the pandemic on the same chart. This flattered the statistics from Sweden, which had famously decided not to lock down. For weeks, people pointed to Sweden’s low death rates as evidence that the lockdowns were unnecessary. But in fact Sweden was simply not as far along the exponential infection curve as countries such as Italy and Spain. As Sweden’s death rates gathered pace, the anti-lockdowners went oddly silent.
“There was a further problem with comparing country statistics. The virus does not respect lines on maps, so the spread of the virus didn’t match country borders.”
There was a further problem with comparing country statistics. The virus does not respect lines on maps, so the spread of the virus didn’t match country borders. For this reason, the FT and others decided not to report infection and death rates as a proportion of total population for each country. But Americans complained that because their population was so much larger than that of individual European countries, not producing “per capita” figures made it look as if the virus was worse in the US than it was in Europe. As if there was some kind of competition.
After a while it became apparent that countries were counting infections and deaths in different ways, and sometimes changing the way they counted them, creating some amazing anomalies. France, for example, had a sudden jump in its death rates when it included care homes. And for a while the UK had two lines on its death chart, one for hospital deaths and the other for all deaths. The “all deaths” figure was some distance behind the hospital figure, because they hadn’t been collecting figures for deaths in care homes and the community.
Since comparing reported deaths was clearly fraught with problems, statisticians took to reporting “excess deaths” – the number of deaths over and above the normal number for the time of year in that country. For the UK, this was considerably higher than the number of deaths from coronavirus actually recorded (Fig 3).
It’s not clear why this is, though there are two potential causes: one is that coronavirus deaths outside hospital are wrongly attributed to other causes because of lack of testing, and the other is that deaths from other causes really have risen. Unsurprisingly, those angry about the carnage in care homes insisted that the discrepancy was because deaths in care homes were being wrongly attributed to other causes, while those who opposed strict virus control measures equally vehemently claimed those measures were causing premature deaths. The same data supported totally different points of view.
And the chart crimes continue. On Friday 19th June, the Spectator’s Fraser Nelson tweeted this chart, which appears to show that infection rates in the UK are so low that the pandemic is effectively over (Fig 4).
But this chart shows lab tests, which are a tiny proportion of coronavirus tests. Actual infection rates are far higher.
Perhaps the nastiest misuse of coronavirus statistics, however, came in Nelson’s next tweet. “While many people still die *with* Covid-19-19, most have other health conditions,” he said. And he showed this chart as evidence that excess deaths were now pretty much normal for the time of year (Fig 5).
His implication was clear. The people whose deaths are now being attributed to Covid-19 would have died anyway of other causes. The pandemic is over.
The “with” versus “of” Covid-19 argument has been raging for some time. It is now firmly established that a high proportion of people whose cause of death is attributed to coronavirus had existing health problems that may have made them more vulnerable to the virus. Because of this, a small but vocal minority argue that deaths from Covid-19 are massively overstated. They say that those people died “with” Covid-19-19, but not “of” it.
“Actuaries working in the field of longevity have comprehensively debunked the notion that people with existing health problems were all at death’s door.”
But the existing health problems that increase vulnerability to coronavirus include things like diabetes, which for many people can be successfully managed for years on end with a combination of diet and medication. Actuaries working in the field of longevity have comprehensively debunked the notion that people with existing health problems were all at death’s door. But this poisonous idea nevertheless refuses to die. Now, there are demands for statisticians to separate out death figures for the elderly and people with existing health problems from the rest of the population, apparently so that we know what the spread of the virus is “in society as a whole”. But the elderly and people with existing health problems are part of society as a whole. Separating them out creates two classes of citizen: those who are deemed vulnerable to the virus, so must be “shielded,” and those who are not, so are free to live as they please.
And that brings me to the point of this post. Coronavirus statistics can be turned to any purpose. They can be used to justify both imposition of a lockdown and lifting of it. They can be used to justify locking up the elderly and the vulnerable indefinitely, at who-knows-what cost to their mental health. And like “the science”, which has been doing some very heavy lifting, they can be used to justify – or hide – disastrous policy mistakes. Statistics need to be reported accurately, used responsibly, and treated with caution.