Correlations and Time Series
A high correlation does not mean what you think it means.
Over 1,700 users shared a series of Twitter posts, which claimed “proof” that:
The vast majority of COVID deaths in England since July have been mislabelled false positive deaths.
The suggestion is that we have more COVID-19 deaths due to greater testing. The posts assert increased testing led to lots of incorrect diagnoses. Their “proof” was high correlation between two time series.
This article focuses on statistical problems with measuring correlation in time series. For this reason among others, their conclusion is false.
Dr Craig, a pathologist, starts with this graph:
There is no given data source, which should be the PHE Coronavirus Dashboard. Dr Craig writes:
You will notice that the shape of the two curves are very similar. We can test this. The chart below demonstrates that since August 93% of the rise in deaths can be accounted for by the rise in the number of tests done in hospitals over the 28 days preceding.
I have never seen such a tight correlation in my career. Biology just isn’t like that. But there it is — 93%.
People should not analyse time series in this way.
Pearson’s correlation is a statistic, measuring linear association of variables. Two variables with high positive correlation will increase together, and decrease together.
Time series are not independent observations. Time connects those points.
This is why such methods are not appropriate for time series.