Member-only story
Life satisfaction and data wrangling II
Dates, strings, and other fun things.
In an earlier post, I showed the code for a graph about life satisfaction. The graph displays survey estimates from an Office for National Statistics survey.
In the data file, dates were in an inconsistent text format. Given the imminent deadline, I wrote out the desired dates.
ons_dates_df <- tribble(~start_date, ~end_date, “2020–03–20”, “2020–03–30”,
“2020–03–27”, “2020–04–06”,
…,…) %>%
mutate(start_date = as_date(start_date),
end_date = as_date(end_date))
This time, I wrangle the dates from lines of text. This is how these dates appear in the ONS file (with new lines removed):
- 20 to 30 March
- 27 March to 6 April
- 30 Sept to 4 Oct
- 22 Dec ’20 to 3 Jan ‘21
Here, ‘to’ separates the fieldwork start date and end date. The numerical day comes first. If both occur in the same month, that month is only show once. Months appear in full and abbreviated forms. Short years can also appear.