Sampling Error and Small Proportions

Sampling error is caused by observing a sample (some of the population) instead of the full population. It is inherent uncertainty, arising from running a survey — rather than a census. We are usually interested in how much the survey estimate could plausibly differ from the population.

There are three notable misunderstandings when people talk about surveys, such as political opinion polls: the sampling error of small proportions, the treatment of sub-samples, and total survey error. This article examines these three misunderstandings.

Margins of sampling error are not uniform

For some reason, the results were spuriously published to two decimal places. This suggests a degree of precision that survey estimates (of that size) cannot offer.

However, the accompanying notes stated:

All of the numbers included are within the margin of error of the study (+/- 2.2%).

This figure applies a result with 50% of the survey sample — which is where the margin of sampling error is highest. If we make the assumption of a simple random sample¹, an easy formula provides the margin of sampling error:

As you can see, we have a lower absolute margin of sampling error where our survey estimate is 1%.

Under that assumption, calculating a 95% confidence interval for a survey sample of 2,000, the margin of sampling error for a proportion of 1.4% would be about 0.5%.

Survey sub-samples

These sub-samples should be used indicatively. An actual survey of 18–24 year-olds may be different. (Photo: YouGov)

Sub-samples tend to be really small, meaning there is large sampling error.

As an example, the YouGov/The Times poll conducted 2nd-3rd July 2019 had 1,605 GB respondents. The weighted sub-sample of Scotland had 138 people.

Survey sub-samples are not ‘internally weighted’.

Imagine we weighted our survey to match the population in terms of gender and region. Whilst the weighting procedures mean there are the right number of Scottish people and the right number of women, it does not mean the weighted survey has the right number of Scottish women.

Consequently, survey sub-samples may not be representative. With many polls, we should expect these sub-samples to be representative. However, there is much sampling variability — so a single sub-sample should not be over-interpreted.

Total survey error

Much can go wrong when producing a survey statistic. (Image: Research Gate)

The total survey error framework identifies five types of non-sampling error:

  1. Validity: the survey is not measuring what it was intended to (such as through a poorly-designed question);
  2. Coverage error: some people (or units) that should be in the potential pool of respondents are omitted or duplicated;
  3. Non-response error: if people not answering the survey (or the particular question) substantially differ those answering, an error arises;
  4. Measurement error: how you conduct the survey affects the recorded values (such as interviewers inadvertently influencing what people put);
  5. Processing error: once the survey responses are collected, these errors come from mistakes made with imputation, encoding, and weights.

¹Technically, surveys estimated derived from opt-in internet panels do not have a margin of sampling error. This is a polite fiction, and credible intervals may be calculated instead to show sampling variability.

The graph was produced using ggplot2 in R. The code may be read on R Pubs.

This blog looks at the use of statistics in Britain and beyond. It is written by RSS Statistical Ambassador and Chartered Statistician @anthonybmasters.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store