Into the tidyverse: dplyr

The dplyr package offers alternatives to base R functions.

Anthony B. Masters
3 min readJan 26, 2021

I completed an introductory course in using R for statistical analysis. The instructions were in the underlying base functions in R (base R).

This article compares the code for data manipulation, using dplyr and base R. This article only covers basic tasks.

Data management and manipulation

Introductory courses ask students to undertake simple manipulate of data frames. These courses may use the base R functions.

Since 2016, there have been a collection of packages which share the same design philosophy. As the packages focus on ‘tidy’ data, designers call this the tidyverse.

Let’s begin. (Image: tidyverse.org)

In this article, I will use the standard iris data-set in R.

One basic step is to count how many rows meet one criterion. We want to know how many plants in our data-set are in the Setosa species.

In base R, we turn the data frame into a series of ‘TRUE’ and ‘FALSE’ entries. When we sum, that turns ‘TRUE’ values into 1, and ‘FALSE’ into 0. The sum then counts the rows which meet the criterion.

sum(iris$Species == "setosa")

--

--

Anthony B. Masters

This blog looks at the use of statistics in Britain and beyond. It is written by RSS Statistical Ambassador and Chartered Statistician @anthonybmasters.