Member-only story
A Spreadsheet of Errors
How can misusing Microsoft Excel induce analytical errors?
Using Microsoft Excel can cause major problems in statistical reporting and analysis. This article looks at several recent incidents of Excel errors.
The missing cases
Public Health England gathered SARS-CoV-2 swab test results from commercial firms. These results were in a list-based format — comma-separated values (CSV) files.
In an automated process, the agency used Excel to pull together these text-based files. For each SARS-CoV-2 test, there were several rows in the file.
The agency used an old version of Excel (XLS format). That meant the collated files could only hold around 65,000 rows. Microsoft superseded that format in 2007. The row limit in the latest version of Excel is about one million.
When the rows breached their limit, the extra lines were missing. There was temporary under-reporting of lab-confirmed cases. These reported cases feed into the NHS Test and Trace system. As a result, this problem delayed attempts to control the virus.
This was not a “glitch”, but the inevitable outcome of the automated process.