Bad graph hunting

Much can go wrong when making data visualisations.

Anthony B. Masters

--

Creating graphs is a key way to communicate statistics. There are many steps to this process, and graphs have a grammar.

Studying bad examples is often a good way to learn. Through failure, gain insights into the means of success.

Google cards

The Google card for reported Covid-19 surveillance deaths is inaccurate. The rolling average calculation is incorrect.

(Image: Google)

Their graph treats non-reporting days as if those values were missing. Reported deaths on days the agency does not update are, by definition, zero. For 21st April 2022, the seven-day rolling average should be about 239. Surveillance deaths by date of report were, in those last three days: 682, 508, and 482.

Since the agency did not report on the other four days, those other figures are zero. To get the seven-day rolling average, we sum and divide by seven.

(Image: UK Health Security Agency)

The Google graph performs a different calculation, averaging non-zero days. If their figures matched the UKHSA dashboard, the calculation would give about 557. These differences appear to arise due to data quality issues.

As a statistic, deaths by reporting date have many limitations. There are reporting cycles from administrative delays. Public health agencies may only publish on certain days. Those reasons mean there are days with low or zero deaths, with days with lots of reported deaths after. Public holidays can also create artefacts.

Moreover: reported deaths are not the number of people who died with Covid-19 “today”. It is the count of new deaths added to the surveillance system since the previous update.

It is better to look at surveillance deaths by the date those people died.

--

--

Anthony B. Masters

This blog looks at the use of statistics in Britain and beyond. It is written by RSS Statistical Ambassador and Chartered Statistician @anthonybmasters.