It is the goal of data analysts, data scientists and statisticians is to analyse data, guiding decisions for the public good. This work involves communication in a simple and understandable manner.
A major difficulty with the use (or misuse) of statistics in workplaces and public debate is the ongoing question for The Number. Ed Humpherson, Director General of the UK Statistics Authority, described this phenomenon:
Statisticians are often asked for The Number, the single point estimate that resolves an issue — or can dropped into a briefing or speech as it if was just padding, mere upholstery. But the statistical leader knows that the single number is elusive; whether it means what people think it means depends crucially on what question you are trying to answer.
This article will look at potential problems induced by our desire for The Number.
No single way of looking at an issue
One example is the annual rate of nominal pay growth in the public sector, which is used by the Independent Parliamentary Standards Authority. There are eight measures which could reasonably describe ‘pay growth in the public sector’, based on whether you choose:
- total pay (including bonuses), or regular pay (excluding bonuses);
- to include to exclude financial services in the public sector;
- to look at a single month, or a three-month rolling average.
None of these answers are ‘wrong’ — just looking at the question in different ways. Each measure provides another element in our mosaic of evidence.
These differences can be beyond definitions. In a paper published in 2015, 29 research teams were given the same data-set and the same question: are football referees more likely to give red cards to players with dark skin than to players with light skin?
These separate analyses produced varied results, with all but two in the same direction. Choices matter: we need to be open and transparent about definitions and techniques used.
Data as decoration
Another problem is that requests may be made simply to get The Number to slot into a presentation, a press release, or a speech.
In this scenario, data does not lead to better decisions: it is decoration. Here, the analyst is reduced to a calculator — cranking out a stat for use elsewhere. No analysis is asked for. A statistic may not mean what others think it means.
A family of examples come from ‘killer stats’ used by political parties, such as “there are 1.9m more children at good or outstanding schools than there were in 2010”. The Education Policy Institute highlighted that number — whilst accurate — partially reflected growth in the total pupil population, some schools going without inspections since 2010, and classification changes in primary schools associated with new inspection categories.
The other side is that statistics may be dismissed as ‘meaningless’ or ‘worthless’ unless they answer a narrow question: a bauble or the bin. Consider, for example, this passage from a recent article on the gender pay gap reporting measures by Kate Andrews (Institute for Economic Affairs):
Why are we giving credence to meaningless and often deceptive gender pay gap statistics, which have us focusing on women’s issues in a way that is damaging to women?
Words have meaning, as do statistics. It is entirely meaningful to compare the pay of the average man and average women within organisations. Gender pay gaps are not measures of in-job pay discrimination, but of differences between the average man and average woman (in that company).
Definiteness over uncertainty
A third problem is — by seeking to encapsulate an issue with The Number — we prioritise definiteness over uncertainty.
This issue often arises in public debates when polling data is discussed. Survey research offers a mirror to societies. Politicians and commentators also like to make claims about public opinion, so it matters that those claims can be tested. However, survey research is based on samples, providing estimates about the total population.
The answers to a single survey question may be used to support a pre-stated belief — about the supposed popularity of a policy or position. A recent example is numerous Conservative MPs using a YouGov poll to claim that ‘the country wants No Deal’ (to leave the European Union without a deal) and other variants.
As Anthony Wells (YouGov) highlights, that survey also found that 50% of that sample also believed No Deal was a bad outcome, and 26% answered No Deal was their preferred outcome. The specific question was worded:
And if Britain has not agreed a deal by April 12th and the European Union refused to grant a further extension, what do you think should happen?
In that hypothetical scenario — which was never realised — No Deal received 44%, and remaining in the European Union (Remain) was chosen by 42%.
In a survey of 2,098 GB adults, these two estimates of support are similar and difficult to distinguish. Surveys are not precise enough to be confident — from a single reading — that No Deal is more popular than Remain in that scenario.
Focusing on point estimates encourages this statistical difficulty. Surveys estimates have plausible ranges: we should be muscular about uncertainty. Question wording and trends should be studied, with single poll results treated cautiously.
No single statistic can tell the whole story. The desire for The Number causes serious issues when statistics are used in public debate. Analysts should balance needs to answer specific questions with duties to widely inform with honesty, integrity and generosity.