The case of the Morgan Stanley graph
The Covid-19 pandemic has put great urgency on visualising data and numbers. That flurry leads to some great and poor graphs.
One example of bad graphing appears to come from Morgan Stanley. The graph shows intensive care patients in ‘closed’ and ‘open’ US states. There is a linear smoothing line, with confidence intervals.
I am unable to find the original report. The earliest record was on Twitter, shared by an academic at Yale School of Medicine.
The implicit conclusion is restrictions failed to work, with rising intensive care patients. One key problem is the linear model ‘smoothing’ a clear jump in occupancy numbers.
Where did this discontinuity come from? Analysis by data scientist Tristan Mahr shows later versions do not have this jump.
An archived version shows missing values for the state of New York. Aggregation of these absent stats gives an artificial rise. In essence, the graph treated numbers which were unavailable as zero.
On 7th May, the New York series started in the COVID Tracking Project files. Due to the lack of federal reporting, there was a voluntary effort to collate figures. Such projects often struggle against institutions who fail to publish statistics. Another major difficulty is publication in strange formats.
Analysts tried to get New York stats from press conference slides. Instead, the numbers of intensive care patients were in the governor’s press emails. 7th May was the first day those analysts were on the email list. That led to the big blip among ‘closed’ states.