Member-only story

Life satisfaction and data wrangling II

Dates, strings, and other fun things.

Anthony B. Masters
3 min readOct 26, 2021

In an earlier post, I showed the code for a graph about life satisfaction. The graph displays survey estimates from an Office for National Statistics survey.

In the data file, dates were in an inconsistent text format. Given the imminent deadline, I wrote out the desired dates.

ons_dates_df <- tribble(~start_date, ~end_date, “2020–03–20”, “2020–03–30”,
“2020–03–27”, “2020–04–06”,
…,…) %>%
mutate(start_date = as_date(start_date),
end_date = as_date(end_date))
This was the basic problem. (Image: xkcd)

This time, I wrangle the dates from lines of text. This is how these dates appear in the ONS file (with new lines removed):

  • 20 to 30 March
  • 27 March to 6 April
  • 30 Sept to 4 Oct
  • 22 Dec ’20 to 3 Jan ‘21

Here, ‘to’ separates the fieldwork start date and end date. The numerical day comes first. If both occur in the same month, that month is only show once. Months appear in full and abbreviated forms. Short years can also appear.

--

--

Anthony B. Masters
Anthony B. Masters

Written by Anthony B. Masters

This blog looks at the use of statistics in Britain and beyond. It is written by RSS Statistical Ambassador and Chartered Statistician @anthonybmasters.

No responses yet