The sub-sample problem

A dearth of Scottish polls leads to desperation.

On social media, people often share survey research of vote intentions. On occasion, you may read estimates from Scotland that are not as they appear.

For example, the ‘UK Briefing’ account posted:

(Scotland) Westminster Voting Intention:
SNP: 37% (-11)
LAB: 25% (+5)
CON: 22% (+1)
LDEM: 10% (+6)
GREEN: 2% (-3)
Via [Redfield & Wilton Strategies], 22 Feb
Changes w/ 15 Feb.

Over 500 users shared this post, which shows a large change in intentions.

This is only a sub-sample. The Redfield & Wilton Strategies poll was of 2,000 GB adults on 22nd February. The survey was of online responses, who were eligible voters in Great Britain. Researchers weighted estimates by age, gender, education, region, and 2019 vote. Vote intention shares also had turnout intention weighting.

One region’ was Scotland: with 154 respondents. In the weighted sample, their responses counted for 180 adults.

There is large uncertainty around these estimates. Assuming a simple random sample, the SNP estimate has a margin of sampling error of seven points. That means an approximate confidence interval surrounding the 37% figure is from 30% to 44%.

These sampling errors are for a true proportion of 50%. (Image: Ipsos MORI)

Most vote intention surveys use internet panels. We can use the polite fiction of a simple random sample. That can understate uncertainty.

Sub-samples may not have the correct weighting. Suppose we weigh our numbers by age and region. That means our weighted sample should contain right numbers of women and Scots. It does not mean the weighted sample will contain the right number of Scottish women.

There are other errors beyond sampling. In the total survey error framework, there are five types of non-sampling error:

  1. Validity: the survey is not measuring what the researcher intends. The survey asks the right question for the wrong concept.
  2. Coverage error: some people do not appear in the sampling frame. Another form of this error is duplication: units appearing more than once.
  3. Non-response error: those who do not answer the survey or question have large differences from those which do.
  4. Measurement error: how researchers conduct the survey affects responses. That can include survey mode, question wording, and response options.
  5. Processing error: after collecting responses, there is some error in imputation, codes, and weights.

With surveys, there are many sources of uncertainty. Survey sub-samples have greater uncertainties.

You can find information about individual polls from the company websites. Clients will also report headline results. Please check the polling results with trustworthy sources before sharing.

Survation conducted a survey of 1,011 Scottish adults via their internet panel. The responses were on 25–26th February, on behalf of the Daily Record. The central vote intention share estimate for the SNP in Westminster was 48%. That is the same as Survation’s previous poll in mid-January.

This blog looks at the use of statistics in Britain and beyond. It is written by RSS Statistical Ambassador and Chartered Statistician @anthonybmasters.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store