Multi-level regression and post-stratification (MRP) is a statistical technique used to project national polling data onto electoral districts, such as constituencies or regions. This is important for understanding elections where the geography of the nationwide vote is crucial, such as states in the US (for their electoral college), or constituencies in the House of Commons.
Remain United commissioned an MRP projection, using internet panel polling data from ComRes, for European Parliament voting intention. ComRes interviewed 4,060 GB adults between 1st and 7th May 2019.
Response Rates and Non-Response Bias
The document starts by recounting that the last two general elections and the EU referendum “were not predicted well by most pollsters”.
Industry response rates are now very low, with most people refusing to take part. Those who do participate are unusual, and the sample becomes unrepresentative, which creates inaccuracy.
Non-response bias is an issue in survey research where the respondents are substantially different from those who were selected but failed to respond.
It is then asserted:
They might have worked well in the past when samples were only a little skewed, but they groan under the weight of modern mis-sampling. That results in polling errors.
Whilst the 2015 General Election polling miss was caused by unrepresentative samples, the same story does not really apply to 2017 (as it was not systemic). Furthermore, only a weak relationship appears between response rates and non-response bias, according to a 2008 meta-analysis. Non-response bias is not solely a function of the non-response rate.
Exit polling is not ‘naturally representative’
The document then states:
One exception is exit polling which has been fairly accurate recently. Exit polls are based on counting votes after they have been cast, and its samples are naturally representative. So, it is insulated from the problems of pre-election polling.
In the UK, results from polling stations are not generally published. The Curtice-Firth method, used in exit polling since 2001, looks at the same polling stations from election to election. People are asked to replicate the vote they just cast, on a mock ballot paper.
The change in exit polls is then studied, building a probabilistic forecast for the entire House of Commons — as this can be expressed as the change from the previous election results.
These samples are not “naturally representative” — the errors from polling station selection should be broadly consistent between elections.
MRP is not an algorithm
The document continues:
Already well-known in the technology industry, where it is called “machine learning”, it is a key part of algorithms used for internet search and language translation. Market researchers have given it the clunky name of multi-level regression and post-stratification, or MRP for short, but it is essentially the same technique.
MRP takes polling data and seeks to build a model for individual voting intention out of demographic characteristics. This is the R: regression.
That model is allowed to vary according to the constituency. This is the M: multi-level.
The model of individual vote intention is then projected across census counts of what kind of people live in each place. This is the P: post-stratification.
There is no ‘machine-learning’ involved. It is not an algorithm in that sense. It is a statistical technique, named by political scientists.
MRP did not do better in 2015
The most confusing assertion made in the document is:
In 2015 and 2017, MRP would have been much more accurate than the flawed classic polls, even with moderate poll sizes of around 4,000 voters.
Nor is it the case that MRP-based models of Commons seats were wholly successful in the 2017 General Election. People may remember the YouGov model, but there was another organisation conducting such modelling.
Lord Ashcroft Polls used British Polling Council members for its fieldwork. Under an MRP model, the central projection was for the Conservatives to win 357 seats. Lord Ashcroft Polls used smaller, weekly surveys of around 2,000 people — after the large 40,000 sample survey that ‘started’ their model. Contrasting with YouGov’s daily surveys of 8,000 people, it may have meant the Lord Ashcroft Polls estimate was too resistant to change.
MRP may be an expensive way of discovering systematic sampling bias.
The greater usage of MRP is an important innovation for the polling industry. However, it is a matter of human learning, rather than machine-learning.
Based on our understanding of public opinion, appropriate models for individual vote intention — including the choice over the district-level predictor — must be selected. Those subjective choices must be clearly and transparently expressed for future assessment.