Member-only story
Reproducible analytical pipelines
How do analysts RAP a process up?
Producing statistics for publication is often a key part of analytical roles. Analysts could work for governments, sending official figures to ministers and the public. Reports could also be internal in businesses, going to senior managers and decision-makers. Industry regulators may be another recipient.
Processes for calculating and publishing statistics can be cumbersome, with many manual steps. There may be many spreadsheets, passing from one team to another. Mutual dependencies between files are possible. For instance, a common way from the data store to final document is:
- Statistical software exports a spreadsheet: A data portal may produce a spreadsheet. In other cases, some code runs — exporting a spreadsheet. Data ‘stores’ may also be flat files and other spreadsheets.
- Spreadsheet manipulation: That file then goes into another spreadsheet. Formulae converts the data tab into the desired graphs and tables.
- Copying into a word document: Graphs and tables go into a document.
- Saving as a PDF: That document then transforms into a PDF.
What are the problems with this approach? There are many steps — taking time and leading to human error. Spreadsheet errors can be horrific…