Member-only story

Into the tidyverse: dplyr

The dplyr package offers alternatives to base R functions.

3 min readJan 26, 2021

I completed an introductory course in using R for statistical analysis. The instructions were in the underlying base functions in R (base R).

This article compares the code for data manipulation, using dplyr and base R. This article only covers basic tasks.

Data management and manipulation

Introductory courses ask students to undertake simple manipulate of data frames. These courses may use the base R functions.

Since 2016, there have been a collection of packages which share the same design philosophy. As the packages focus on ‘tidy’ data, designers call this the tidyverse.

In this article, I will use the standard iris data-set in R.

One basic step is to count how many rows meet one criterion. We want to know how many plants in our data-set are in the Setosa species.

In base R, we turn the data frame into a series of ‘TRUE’ and ‘FALSE’ entries. When we sum, that turns ‘TRUE’ values into 1, and ‘FALSE’ into 0. The sum then counts the rows which meet the criterion.

sum(iris$Species == "setosa")

In dplyr, we often use the ‘forward-pipe’ operator %>%. This operator will forward a value or resulting expression into the next expression. That way, we can build our query — step by step.

iris %>%
  filter(Species == "setosa") %>%
  count()

Instead of a single count, we can produce frequency tables. In base R, that uses the length function on the Species column.

tapply(iris$Species, iris$Species, length)

In the tidyverse, we count by species:

iris %>% count(Species)

Another common manipulation is to create a subset. This is similar in base R and tidyverse code.

iris_setosa_df <- subset(iris, iris$Species == "setosa")iris_setosa_df <- iris %>% filter(Species == "setosa")

We may wish to take the top rows of the data set. In base R, the order function puts the vector elements in a certain order. That order is then used…

Into the tidyverse: dplyr

The dplyr package offers alternatives to base R functions.

Data management and manipulation

Written by Anthony B. Masters

No responses yet