Benford’s Law and Election Data

Why do first digits of votes diverge from Benford’s distribution?

There are claims of statistical “proof” of election fraud in the United States. These claims often depend on considering leading digits of vote counts. The resulting distribution does not match the Newcomb-Benford distribution.

An LBC radio host shared an article with flawed analysis, misapplying Benford’s Law.

Does this failure to match imply “fraud” or “manipulation”? No.

Benford’s Law is not universal — the data set needs certain properties. Electoral counts do not have these properties: we should not expect conformity.

What is Benford’s Law?

Numbers can suffer manipulation. To detect anomalies, we want an expected distribution for comparison.

In some data sets, the leading digit 1 appears much more often than the leading digit 9. That is, more numbers in the data set start with a 1 than a 9.

Image for post
Image for post
Here, 1 appears as a leading digit more than 9. (Image: Significance)

The astronomer Simon Newcomb first found this ‘law’, viewing logarithmic tables in 1881:

That the ten digits do not occur with equal frequency must be evident to any one making much use of logarithmic tables, and noticing how much faster the first pages wear out than the last ones.

Frank Benford, a physicist, observed the same pattern in 1938. Benford highlights a diverse range of sets approximate this predicted distribution:

Image for post
Image for post
(Image: Imperial College London/Adrien Jamain)

When does Benford’s Law apply?

Despite being a ‘law’, it is not universal. It is an observation about some types of data sets. William Goodman restated some guidelines for suitability towards conforming to Benford’s Law:

  • A large sample: Small collections of numbers would make small deviations appear noticeable.
  • A high span of numerical values: The sample should include values across many orders of size.
  • Right-skewed distributions: Conforming sets often have origins in multiplication or combinations.
  • Non-arbitrary values: Arbitrary assignments of numbers do not exhibit these patterns.

Some kinds of formal distributions follow Benford’s Law and appear in nature.

Image for post
Image for post
Some data sets are close to the expected proportions. Others are not. (Image: Significance)

What about elections?

These claims of electoral fraud depend on misapplications of Benford’s Law.

We should not expect precinct vote counts to follow the Newcomb-Benford distribution. For example, in Chicago, over 97% of precincts had a three-digit number for their total vote count. This is for the 2020 US Presidential election.

Image for post
Image for post
Of the 2,069 precincts, 2,023 had a vote total between 100 and 1,000. (Image: R Pubs)

The total numbers of cast votes show a bell-shape around the median. That set is not suitable for conforming to the Newcomb-Benford distribution.

Moreover, the Democrat vote share is high — over 80% in most precincts. Our expectation should be that 3, 4, and 5 are common leading digits. This is because lots of Democrat ticket vote counts are between 300 and 600.

Image for post
Image for post
The Democrats have a high vote share in Chicago. (Image: R Pubs)

That is what we see. We should not expect conformity to the Newcomb-Benford distribution.

Image for post
Image for post
We would not expect conformity to the Newcomb-Benford distribution. (Image: R Pubs)

Vote counts for each party are not independent. Almost all votes are cast for Democrats and Republicans. Suppose the Republican vote count did follow the pattern with its leading digits. Given precinct sizes, the Democrat count could not conform to the Newcomb-Benford distribution. In this flawed analysis, one party must appear to diverge from the pattern of leading digits.

As Prof Walter Melbane (Michigan) writes:

It is widely understood that the first digits of precinct vote counts are not useful for trying to diagnose election frauds.

Analysing the distribution of second digits is not a great diagnostic tool either:

It is not simply that the Law occasionally judges a fraudulent election fair or a fair election fraudulent. Its “success rate” either way is essentially equivalent to a toss of a coin, thereby rendering it problematical at best as a forensic tool and wholly misleading at worst.

In applications like forensic accounting, non-conformity is not ‘proof’ of fraud. It is often a flag for anomalies worthy of further investigation — such as standard auditing.

There are statistical questions about non-conformity. What is the distribution of the error term? Analysts should be explicit about their test models.

Such tests may be inconsistent with differences between predicted patterns and real data. That can lead to mistaken analysis and misinterpretation. William Goodman writes:

Without an error term it is too imprecise to say that a data set “does not conform” to Benford’s law. By how much does it have to differ from expected values to “not conform”?

Analysis of precinct vote counts is an inappropriate use of Benford’s Law. We would not expect the leading digits to show such patterns. Total votes in precincts do not span many orders of size.

This method does not provide evidence of electoral fraud. Benford’s Law does not work as an automatic fraud detector.

Matt Parker produced a video on Benford’s Law. Prof Golbeck (UMD) also wrote about this topic. The R code for the graphs is available on R Pubs and GitHub.

This blog looks at the use of statistics in Britain and beyond. It is written by RSS Statistical Ambassador and Chartered Statistician @anthonybmasters.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store