Member-only story
Cumulative sums are not independent
Are China’s coronavirus statistics “too perfect”?
A financial news site said Chinese coronavirus figures were “too perfect to mean much”.
A severe statistical misunderstanding underlies this complaint. Cumulative sums are, by definition, not independent observations. This article shows how cumulative sums of pseudo-random data are also ‘too perfect’ by the same measure.
It all adds up
The financial investment magazine Barrons headlined:
China’s Coronavirus Figures Don’t Add Up. ‘This Never Happens With Real Data.’
The Chinese government submits statistics about the coronavirus to the World Health Organisation. The article asserts “a simple mathematical formula” describes cumulative deaths. This simple model has “very high accuracy”.
‘Cumulative’ means adding up as you go along. Imagine there were three deaths on the first day, and five on the second day. The cumulative total deaths after two days is eight. On the third day, two more people die. The cumulative total becomes ten.
Imagine you wanted to express deaths in connection to average temperature in Wuhan. We calculate how much the varying temperature explains the variance of deaths.
In the jargon, statisticians call this value the coefficient of determination, or ‘R-squared’. The value goes from 0 to 1 (or 0% to 100%). If it is 1, the model fits…