End of the Regression Line

Anthony B. Masters
4 min readFeb 16, 2019

--

In June 2018, The Economist published a graph showing EU membership referendum voting intention over time:

The Economist graph correctly shows Remain ahead of Leave, after the referendum. (Image: The Economist)

This article considers a replication attempt, and looks at pre-referendum polls.

In short

No leap for Leave: Looking at pre-referendum polling averages, there was no leap to Leave in June 2016.
Before and After: A continuous regression line runs the risk that the line’s position before the referendum date will be affected by data points afterwards.
Graphical transparency: The failure to replicate shows the importance of data visualisation teams sharing their scripts or snippets.

Polling before the referendum

The Economist’s graph seeks to show that polls conducted after the referendum now show Remain ahead of Leave.

Their graph gives the impression there was a major leap in Leave’s share in the final month. This is likely to be misleading.

For instance, Carole Cadwalladr, a journalist for The Observer, wrote that we should “never forget this graph”, and “the graph above correlates exactly”. These claims have been shared on Twitter thousands of times.

Pre-referendum polls do not show such a clear trend. The rolling average shows the Leave share — after excluding people who say they don’t know — rising irregularly, with fluctuations. However, there was a shift from Leave to Remain in the last week.

The final six polls showed Remain at 52%. (Image: What UK Thinks EU)

In the final four weeks, there were more polls with Leave leading than with Remain ahead. Two of the final six polls showed Leave ahead (conducted by Opinium and Kantar). The average Remain share for these six polls was 52%.

A replication attempt

Drawing data from the What UK Thinks EU website, I attempted to replicate The Economist’s graph using R Studio Cloud. I used a Loess regression curve, which seeks to guide a signal through the noise of a scatter plot.

My graph was similar, but I could not achieve that large leap in the Leave line before 23rd June 2016.

Polling companies changed their methods after the EU referendum: including the recall of that vote as part of the weighting procedures. There is the risk that the position of the line before the referendum will be affected by polling data after the referendum took place. This change in methods should be recognised in our graph.

We can introduce a discontinuity, and show the uncertainty of where to draw our lines.

Previously, I only used repetitions of the EU referendum question after the referendum. We can add in other variants, such as the YouGov Eurotracker data.

Finally, we can remove people who said they did not know, or would not vote. In that case, we can just show the Remain line.

Decisions and Transparency

As we can see, I have made multiple choices of how to show the intended graph of EU referendum polling data, both before and after 23rd June 2016.

Every analyst makes choices when visualising data. These choices affect the final graph. Given the desire to show how public opinion has shifted after June 2016, the choice to include polling data from before the UK’s EU referendum must be questioned.

Given how important graphs are for telling a story, data visualisation teams should share their scripts or snippets.

The data was drawn from the What UK Thinks EU website, for the EU referendum question (before and after), asking people how they would vote in another referendum, and the YouGov Eurotracker series.

The compiled data may be downloaded from Google Sheet. I have published an RPubs page containing the R code.

--

--

Anthony B. Masters
Anthony B. Masters

Written by Anthony B. Masters

This blog looks at the use of statistics in Britain and beyond. It is written by RSS Statistical Ambassador and Chartered Statistician @anthonybmasters.

No responses yet