The sub-sample problem
On social media, people often share survey research of vote intentions. On occasion, you may read estimates from Scotland that are not as they appear.
For example, the ‘UK Briefing’ account posted:
(Scotland) Westminster Voting Intention:
SNP: 37% (-11)
LAB: 25% (+5)
CON: 22% (+1)
LDEM: 10% (+6)
GREEN: 2% (-3)
Via [Redfield & Wilton Strategies], 22 Feb
Changes w/ 15 Feb.
Over 500 users shared this post, which shows a large change in intentions.
This is only a sub-sample. The Redfield & Wilton Strategies poll was of 2,000 GB adults on 22nd February. The survey was of online responses, who were eligible voters in Great Britain. Researchers weighted estimates by age, gender, education, region, and 2019 vote. Vote intention shares also had turnout intention weighting.
One ‘region’ was Scotland: with 154 respondents. In the weighted sample, their responses counted for 180 adults.
There is large uncertainty around these estimates. Assuming a simple random sample, the SNP estimate has a margin of sampling error of seven points. That means an approximate confidence interval surrounding the 37% figure is from 30% to 44%.
Most vote intention surveys use internet panels. We can use the polite fiction of a simple random sample. That can understate uncertainty.
Sub-samples may not have the correct weighting. Suppose we weigh our numbers by age and region. That means our weighted sample should contain right numbers of women and Scots. It does not mean the weighted sample will contain the right number of Scottish women.
There are other errors beyond sampling. In the total survey error framework, there are five types of non-sampling error:
- Validity: the survey is not measuring what the researcher intends. The survey asks the right question for the wrong concept.
- Coverage error: some people do not appear in the sampling frame. Another form of this error is duplication: units appearing more than once.
- Non-response error: those who do not answer the survey or question have large differences from those which do.
- Measurement error: how researchers conduct the survey affects responses. That can include survey mode, question wording, and response options.
- Processing error: after collecting responses, there is some error in imputation, codes, and weights.
With surveys, there are many sources of uncertainty. Survey sub-samples have greater uncertainties.
You can find information about individual polls from the company websites. Clients will also report headline results. Please check the polling results with trustworthy sources before sharing.