Systematic Sampling

What is systematic sampling?

There are many ways to take a sample from a population. One popular way is to take a systematic sample.

This article looks at this sampling method, and the precision of resulting estimates.

Taking a sample

Surveys help researchers answer questions about a population. There are units — like people or businesses — that make up a population.

In a simple random sample, every unit has an equal and non-zero probability of selection. Common examples would be bingo machines or lotteries. The selection machine gives each ball the same chance of appearing.

Systematic sampling assigns every unit a unique number. A systematic sample starts at a random point, then picks units at regular intervals.

Suppose there were 1,000 tents at a festival. The researcher wants to survey 100 tents. They pick a number between one and 10: such as seven. The researcher then chooses every tenth tent: 7th, 17th, 27th, 37th tents, and so on. The sample is then of 100 tents.

In the example, the researcher picks every third person in the systematic sample. (Image: Scribber)

There are several advantages to systematic sampling:

  • Easier to conduct: Systematic samples are simpler to construct than simple random samples.
  • Eliminating clusters: By chance, simple random samples can select units which are close. This does not happen in systematic sampling.

There are disadvantages too:

  • A determined population size: we need to know or approximate the population size.
  • Need for natural randomness: there should no hidden pattern in the numbered population. For example, an employee database could group people by their teams. A systematic sample runs the risk of including too many or too few senior employees.

With a set sampling fraction, researchers do not need to know how big the population is. This is why systematic sampling is common for web intercept surveys. The sampling choice is to invite a fixed fraction of website users to answer questions.

Another application of systematic sampling is forest inventory.

Practice and pedantry

There are two arising questions about systematic sampling?

  • Do systematic sampling provide unbiased estimates?
  • What is the variance of those systematic sampling estimates?

For simplicity, I will consider some score — where we seek the mean average. The systematic sample mean is an unbiased estimator for the population mean. With some variation, systematic samples produce the right central estimates.

The standard error of the mean is: the square root of the variance of the mean. This is an important statistic to understand. It shows how close sample estimates are likely to be to the mean.

One assumption is that systematic samples are like simple random samples. This is for calculation, such as the uncertainty surrounding each estimate. In 1997, Crawford’s book Marketing Research and Information Systems captured this sentiment:

However, because there is no conscious control of precisely which distributors are selected, all but the most pedantic of practitioners would treat a systematic sample as though it were a true random sample.

The sampling distribution depends on all possible samples. This distribution describes how estimates vary from sample to sample.

In general, there are a much lower number of possible systematic samples. In our earlier example, there were only ten potential starting points. As a consequence, there were only ten possible systematic samples.

To investigate, I simulated 10,000 populations. Each population contained 1,000 observations of a standard Normal distribution. We use the actual mean of the simulated populations for comparison. We can find population variance, rather than estimate via the sample variance. That knowledge affects our calculations for the variance between samples.

For most simulated populations, standard errors were lower for systematic sampling. The samples were of 100 units — or 1 in 10 of the population.

The axes have different limits. Standard errors in simple random sampling were more consistent. (Image: R Pubs)

There was a general tendency for systematic sampling standard errors to be smaller. In these conditions, the simple random sample assumption is likely to be conservative. This is a tendency, not an iron law. Using the calculation without the finite population correction is even more conservative.

There is a variation in results with noisy populations. (Image: R Pubs)

There are other methods for estimating the variance. Those alternatives include taking differences within the sample.

When variation within the sample is large, systematic sampling is more precise. If there is some periodicity within the sample, this kind of sampling may be less efficient. Patterns within the numbered population influences how efficient systematic sampling is.

The R code for the graphs is available on R Pubs and GitHub.

This blog looks at the use of statistics in Britain and beyond. It is written by RSS Statistical Ambassador and Chartered Statistician @anthonybmasters.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store