Behind the Numbers: SOTY 2020

How did analysts calculate the different numbers?

The Royal Statistical Society runs the Statistics of the Year competition. This is to highlight key figures from the past year.

This article provides information about quality and methods for the six chosen numbers. The original version of this article appears on the Royal Statistical Society’s website.

International Statistic of the Year

Winner: 332 days

The length of time between scientists publishing the genetic sequence of COVID-19 (11 January 2020) and an effective vaccine being administered (8 December 2020).

Various sources give different dates for when scientists published the genetic sequence. The European Centre for Disease Prevention and Control says the 10th January 2020. This refers to the deposit on the GenBank database, with accession number MN908947. The World Health Organisation website states the 11th January 2020. This discrepancy may be due to time zones. The Virological post writes it was on ‘10th January’, but the record is time-stamped as ‘Jan 11, 1:05AM’.

The NHS first administered the Pfizer/BioNTech vaccine on 8th December 2020.

If we take 11th January as the start date, then there were 332 days to 8th December.

Highly Commended: 3 out of 5

Only three out of five people worldwide have basic handwashing facilities, according to the latest estimates.

Unicef run a Joint Monitoring Programme for Water Supply, Sanitation and Hygiene (JMP). Unicef and the World Health Organisation established the JMP in 1990. From their 2017 studies for households, 60% of the world have ‘basic hygiene’ facilities at home. That is the estimated proportion of people in homes with soap and water for hand-washing. The headline statistic comes from the JMP Hygiene Baselines Pre-COVID-19 Global Snapshot.

There are often data gaps with global studies. In 2017, 78 countries had estimates for basic hand-washing facilities. There are no comprehensive estimates of basic hygiene facilities in healthcare buildings. The JMP uses a measure of access to hand-washing facilities. Asking people about hand-washing habits could lead to respondents over-reporting ‘good’ behaviour.

The JMP’s estimations begin with national representative data sources. These sources include censuses, surveys, and administrative data. The Programme consults with countries about appropriate data sources to use.

The JMP estimates nine primary indicators from the data inputs. Suppose two points in the same series are more than five years apart. In that case, researchers interpolate using ordinary least squares linear regression. If the gap is less than five years, a simple average joins the points. There are also rules for extrapolations and extensions. The reports impute estimates where there are missing data points.

Highly Commended: 5.5 million years

According to the latest estimates, over three trillion minutes will be spent on Zoom globally this year — equivalent to around 5.5 million years.

According to the BBC: ‘the firm is forecast to have hosted three trillion minutes of meetings by the end of the year.’ Eric Yuan (Zoom CEO) shared this forecast in an interview. The Zoom public relations team confirmed this projection of global ‘annualised meeting minutes.’ This projection is from the September 2020 run rate. Forecasting is uncertain: the realised value could be higher or lower.

Three trillion minutes is around 5.7 million years. We have rounded this figure down, as this is an uncertain forecast.

UK Statistic of the Year

Winner: 17,750

The number of excess deaths between 4th April and 1st May in care homes in England and Wales. Total deaths in care homes were 200% higher than the five-year average, compared with 85% higher in the home, and 65% higher in hospitals.

Excess deaths are the number of deaths above a past or modelled baseline. For its baseline, the Office for National Statistics uses the past five-year average. In 2020, that average refers to the same weeks in 2015 to 2019.

Every death that occurs in England and Wales must have a registration in England and Wales. The ONS collates death registration statistics from registry offices. Registrars record information relating to each death in the Registration Online system.

Weekly death registrations are provisional. The headline statistic is about registered deaths between 4th April and 1st May (weeks 15 to 18). These people died in care homes. The registration date is different from when those deaths occurred. There is a delay between when someone died and the death registration. For most deaths, that delay should be five days or shorter. Registrations can take longer, such as waiting for coroner reports. Changeable public holidays affect registration volumes.

The count of registered deaths in care homes was 26,563 in weeks 15 to 18. We compare this to an average of 8,813 in weeks 15 to 18 over the past five years. Total deaths in care homes were 201% higher (rounded to 200%) than the five-year average. We drew the statistics from Figure 6 in the ONS weekly provisional death report. That report included registrations up to the week ending on 27th November 2020.

All registration figures for 2020 are provisional. (Image: ONS)

Total deaths were higher than the five-year average in weeks 15 to 18: 85% higher in homes and 65% in hospitals. The weekly report has four categories for place of death:

  • Hospital: acute or community hospitals, but not psychiatric hospitals.
  • Care homes: homes for the elderly and those with chronic illnesses. This category includes nursing homes and homes for people with mental health problems.
  • Home: the deceased’s usual residency, according to the informant. That excludes communal places.
  • Other: this category includes hospices and other communal places (like schools and hotels). It also includes deaths elsewhere, such as on motorways or those who were dead on arrival at the hospital.

Analysts could look at deaths in all locations from all causes. In England and Wales, there were 80,817 deaths registered in weeks 15 to 18. In the average of the past five years, that number was 41,416 deaths. Excess registered deaths were 39,401 (95% of the five-year average). The ONS also published a comparison of all-cause mortality across European countries.

National Records Scotland publishes death registration statistics for Scotland. The statistical offices use different registration weeks. Weeks 15 to 18 corresponds to: 6th April to 3rd May in Scotland. In these weeks, there were 7,409 registered deaths. The past five-year average was 4,333. Excess registered deaths were 3,076 (71% of the five-year average).

The Northern Ireland Statistical Research Agency performs the same function in Northern Ireland. Their registration weeks are one week behind those of the ONS. For weeks 14 to 17 (4th April to 1st May), there were 1,756 registered deaths. The past average in the last five years was 1,189. Excess registered deaths were 3,076 (48% of the five-year average).

Highly commended: 19 times

Black men aged 18 to 24 in London are, on average, 19 times more likely to be stopped and searched, in comparison to the city’s overall population.

The UCL Institute for Global City Policing calculated this statistic. Their report is Stop and Search in London — July to September 2020, published in November 2020.

The Institute drew data on police searches from data.police.uk. Searches ‘in London’ refers to searches by:

  • the Metropolitan Police Service,
  • City of London Police,
  • the British Transport Police at a location in London.

Police forces update the data set of individual stop and search records monthly. The UCL report uses self-identified ethnicity in this data set.

The figures use 2020 estimates of London’s population by age and ethnicity. The report expressed those ratios per 1,000 people. There is a limitation: not everyone that the police stop and search in London lives there. The Greater London Authority produces these estimates, which are 2016-based projections. Projections by ethnicity uses migration patterns found in the 2011 Census. Such movement patterns may have changed since then.

Lead UCL academic, Dr Matt Ashby shared the R code to calculate stop and search disparities. The code shows the calculation for Metropolitan Police Service searches. The search rate for black men aged 18–24 was 18 times higher than the overall population.

Highly commended: 19%

Around one in five (19%) of adults were likely to be experiencing some form of depression during the coronavirus pandemic (June 2020). This almost doubled from an average of around one in ten (10%) before the pandemic (July 2019 to March 2020).

The Office for National Statistics conducts the Opinion and Lifestyle Survey. This survey runs for eight months in a year, with around 2,000 respondents each month. The ONS draws these samples from its wider Annual Population Survey. The APS uses the Royal Mail’s Postal Address File as its sampling frame. We drew the figures from: Coronavirus and depression in adults, Great Britain: June 2020.

Adult respondents (aged 16 or over) answered questions. The Patient Health Questionnaire (PHQ-8) features eight items. The resulting score is a self-reported measure of depression for the past two weeks. For the items, respondents can say they are ‘bothered’ by problems ‘not at all’ (scored 0) to ‘nearly every day’ (scored 3). The total score is out of 24: 10 or more means the respondent reports ‘moderate to severe symptoms’.

Surveys provide estimates, subject to many sources of potential error. The statistic focuses on the proportion of respondents having a PHQ-8 score of 10 or higher. The confidence interval around the earlier central estimate (of 10%) is 8% — 12%. For the June 2020 wave, the confidence interval is 16% — 22%, for a central estimate of 19%.

There were two waves of this survey. The first survey was between July 2019 and March 2020, via telephone interviews. The same respondents were then re-contacted from 4th to 14th June 2020. The recontact was with an online-first survey. There was an option for phone interviews where an internet survey was not possible.

The advantage of this approach is researchers ask the same respondents twice. Researchers can conduct a longitudinal analysis of the likelihood of self-reported depression symptoms. One limitation is small sub-samples, such as for younger adults. That means estimates for these groups are less precise. There was a change in survey mode: from telephone interviews to online self-completion. That change may also affect responses.

This blog looks at the use of statistics in Britain and beyond. It is written by RSS Statistical Ambassador and Chartered Statistician @anthonybmasters.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store