Into the tidyverse: dplyr
The dplyr package offers alternatives to base R functions.
I completed an introductory course in using R for statistical analysis. The instructions were in the underlying base functions in R (base R).
This article compares the code for data manipulation, using dplyr and base R. This article only covers basic tasks.
Data management and manipulation
Introductory courses ask students to undertake simple manipulate of data frames. These courses may use the base R functions.
Since 2016, there have been a collection of packages which share the same design philosophy. As the packages focus on ‘tidy’ data, designers call this the tidyverse.
In this article, I will use the standard iris data-set in R.
One basic step is to count how many rows meet one criterion. We want to know how many plants in our data-set are in the Setosa species.
In base R, we turn the data frame into a series of ‘TRUE’ and ‘FALSE’ entries. When we sum, that turns ‘TRUE’ values into 1, and ‘FALSE’ into 0. The sum then counts the rows which meet the criterion.
sum(iris$Species == "setosa")