We will be using tools containined in the dplyr() package, which is already loaded when we load the tidyverse. There are five main functions in dplyr corresponding to the most common things you’ll want to do with your data. We will learn each of these.
arrange()
filter()
mutate()
select()
summarise()
In Module 2, we looked at NBA shooting data over 20 seasons. When we visualized this data, we noticed that there were some players who took very few of a certain type of shot. In order to verify this, we could try sorting our tbl according to the number of field goals attemped. We’ll start by loading the NBA shooting data again into a tbl called raw_shooting
(the reasons for this naming convention will be clearer soon).
> library(tidyverse)
> raw_shooting <- read_csv(file = "data/nba_shooting.csv")
Parsed with column specification:
cols(
PLAYER = col_character(),
SEASON = col_integer(),
FGM = col_integer(),
FGA = col_integer(),
TPM = col_integer(),
TPA = col_integer(),
FTM = col_integer(),
FTA = col_integer(),
FGP = col_double(),
TPP = col_double(),
FTP = col_double()
)
The arrange()
function works by taking a tbl and a set of column names and sorting the data according to the values in these columns.
> arrange(raw_shooting, FGA)
# A tibble: 7,447 x 11
PLAYER SEASON FGM FGA TPM TPA FTM FTA FGP TPP FTP
<chr> <int> <int> <int> <int> <int> <int> <int> <dbl> <dbl> <dbl>
1 Dajuan Wa… 2007 1 1 1 1 1 2 1 1 0.5
2 Tyson Whe… 1999 1 1 1 1 1 2 1 1 0.5
3 Alvin Wil… 2007 0 2 0 1 2 4 0 0 0.5
4 Donald Wh… 1998 1 2 0 1 0 2 0.5 0 0
5 Mustafa S… 2014 0 3 0 1 1 2 0 0 0.5
6 John Luca… 2011 1 3 0 1 0 2 0.333 0 0
7 Roger Pow… 2007 0 3 0 1 2 2 0 0 1
8 Alvin Wil… 2006 0 3 0 2 1 2 0 0 0.5
9 Rusty LaR… 2004 1 3 1 1 1 2 0.333 1 0.5
10 Dell Demps 1997 0 3 0 1 2 2 0 0 1
# ... with 7,437 more rows
We see now that there were two players who attempted only one field goal. We could instead sort the data according to FGA but in descending order, using desc()
:
> arrange(raw_shooting, desc(FGA))
# A tibble: 7,447 x 11
PLAYER SEASON FGM FGA TPM TPA FTM FTA FGP TPP FTP
<chr> <int> <int> <int> <int> <int> <int> <int> <dbl> <dbl> <dbl>
1 Kobe Brya… 2006 978 2173 180 518 696 819 0.450 0.347 0.850
2 Allen Ive… 2003 804 1940 84 303 570 736 0.414 0.277 0.774
3 Jerry Sta… 2001 774 1927 166 473 666 810 0.402 0.351 0.822
4 Kobe Brya… 2003 868 1924 124 324 601 713 0.451 0.383 0.843
5 Michael J… 1998 881 1893 30 126 565 721 0.465 0.238 0.784
6 Michael J… 1997 920 1892 111 297 480 576 0.486 0.374 0.833
7 LeBron Ja… 2006 875 1823 127 379 601 814 0.480 0.335 0.738
8 Allen Ive… 2006 815 1822 72 223 675 829 0.447 0.323 0.814
9 Allen Ive… 2005 771 1818 104 338 656 786 0.424 0.308 0.835
10 Tracy McG… 2003 829 1813 173 448 576 726 0.457 0.386 0.793
# ... with 7,437 more rows
When we specify more than one column, arrange()
uses each additional column name to break ties in the values of preceding columns
> arrange(raw_shooting, FGA, TPA, FTA)
# A tibble: 7,447 x 11
PLAYER SEASON FGM FGA TPM TPA FTM FTA FGP TPP FTP
<chr> <int> <int> <int> <int> <int> <int> <int> <dbl> <dbl> <dbl>
1 Dajuan Wa… 2007 1 1 1 1 1 2 1 1 0.5
2 Tyson Whe… 1999 1 1 1 1 1 2 1 1 0.5
3 Donald Wh… 1998 1 2 0 1 0 2 0.5 0 0
4 Alvin Wil… 2007 0 2 0 1 2 4 0 0 0.5
5 Mustafa S… 2014 0 3 0 1 1 2 0 0 0.5
6 John Luca… 2011 1 3 0 1 0 2 0.333 0 0
7 Roger Pow… 2007 0 3 0 1 2 2 0 0 1
8 Rusty LaR… 2004 1 3 1 1 1 2 0.333 1 0.5
9 Dell Demps 1997 0 3 0 1 2 2 0 0 1
10 Alvin Wil… 2006 0 3 0 2 1 2 0 0 0.5
# ... with 7,437 more rows
When we start computing advanced statistics like effective field goal percentage and true shooting percentage, we probably don’t want to consider those players for whom we have very little data. For instance, we probably do not want to include the players who took a very limited number of shots in any one season in our analysis. The function filter()
is used to pull out subsets of observations that satisfy some logical condition like “FGA > 100” or “FGA > 100 and FTA > 50”.
To make such comparisons in R, we have the following operators available at our disposal:
==
for “equal to”!=
for “not equal to”<
and <=
for “less than” and “less than or equal to”>
and >=
for “greater than” and “greater than or equal to”&
, |
, !
for “AND” and “OR” and “NOT” The code below filter out all of the players with at least 100 field goals in a single season> filter(raw_shooting, FGA > 100)
# A tibble: 6,295 x 11
PLAYER SEASON FGM FGA TPM TPA FTM FTA FGP TPP FTP
<chr> <int> <int> <int> <int> <int> <int> <int> <dbl> <dbl> <dbl>
1 Stephen C… 2016 805 1597 402 887 363 400 0.504 0.453 0.908
2 James Har… 2016 710 1617 236 657 720 837 0.439 0.359 0.860
3 Kevin Dur… 2016 698 1381 186 480 447 498 0.505 0.388 0.898
4 DeMarcus … 2016 601 1332 70 210 476 663 0.451 0.333 0.718
5 LeBron Ja… 2016 737 1416 87 282 359 491 0.520 0.309 0.731
6 Damian Li… 2016 618 1474 229 610 414 464 0.419 0.375 0.892
7 Anthony D… 2016 560 1137 35 108 326 430 0.493 0.324 0.758
8 Russell W… 2016 656 1444 101 341 465 573 0.454 0.296 0.812
9 DeMar DeR… 2016 614 1377 47 139 555 653 0.446 0.338 0.850
10 Paul Geor… 2016 605 1448 210 565 454 528 0.418 0.372 0.860
# ... with 6,285 more rows
When we run this code, you’ll notice that R prints out a tbl with 6,385 rows. However, it has not removed the players with fewer than 100 field goals from the original tbl raw_shooting
. In fact, dplyr functions never modify their input but work by creating a copy and modifying that. So if we wanted to be able to use the tbl consisting of just those players with a least 100 field goal attempts, we will need to save this modified copy of raw_shooting
as a new tbl. Arranging this new tbl verifies that all observations contained in it have at least 100 field goals attempts.
> new_data <- filter(raw_shooting, FGA >= 100)
> arrange(new_data, FGA)
# A tibble: 6,306 x 11
PLAYER SEASON FGM FGA TPM TPA FTM FTA FGP TPP FTP
<chr> <int> <int> <int> <int> <int> <int> <int> <dbl> <dbl> <dbl>
1 Jordan Fa… 2016 42 100 16 45 10 10 0.42 0.356 1
2 Jerry Sta… 2012 37 100 13 38 21 23 0.37 0.342 0.913
3 Brian Car… 2011 43 100 42 87 17 18 0.43 0.483 0.944
4 Kyrylo Fe… 2011 44 100 0 1 18 46 0.44 0 0.391
5 Jonathan … 2010 40 100 14 39 24 26 0.4 0.359 0.923
6 Ryan Bowen 2008 49 100 0 1 16 29 0.49 0 0.552
7 Richie Fr… 2006 39 100 23 70 7 10 0.39 0.329 0.7
8 Lindsey H… 2006 37 100 11 43 2 4 0.37 0.256 0.5
9 Oliver Mi… 2004 53 100 0 1 15 23 0.53 0 0.652
10 Mark Jack… 2004 34 100 7 41 28 39 0.34 0.171 0.718
# ... with 6,296 more rows
We can also filter on more complicated conditions constructed using the AND, OR, and NOT operators: &
, |
, and !
. For instance, to filter observations with at least 100 field goal attempts OR 50 three point attempts, we would do
> filter(raw_shooting, FGA >= 100 | TPA >= 50)
# A tibble: 6,328 x 11
PLAYER SEASON FGM FGA TPM TPA FTM FTA FGP TPP FTP
<chr> <int> <int> <int> <int> <int> <int> <int> <dbl> <dbl> <dbl>
1 Stephen C… 2016 805 1597 402 887 363 400 0.504 0.453 0.908
2 James Har… 2016 710 1617 236 657 720 837 0.439 0.359 0.860
3 Kevin Dur… 2016 698 1381 186 480 447 498 0.505 0.388 0.898
4 DeMarcus … 2016 601 1332 70 210 476 663 0.451 0.333 0.718
5 LeBron Ja… 2016 737 1416 87 282 359 491 0.520 0.309 0.731
6 Damian Li… 2016 618 1474 229 610 414 464 0.419 0.375 0.892
7 Anthony D… 2016 560 1137 35 108 326 430 0.493 0.324 0.758
8 Russell W… 2016 656 1444 101 341 465 573 0.454 0.296 0.812
9 DeMar DeR… 2016 614 1377 47 139 555 653 0.446 0.338 0.850
10 Paul Geor… 2016 605 1448 210 565 454 528 0.418 0.372 0.860
# ... with 6,318 more rows
We may combine these constraints by enclosing them in parantheses.
> filter(raw_shooting, (FGA >= 100 & TPA >= 50) | (FGP >= 0.45 & FGP <= 0.5))
# A tibble: 4,837 x 11
PLAYER SEASON FGM FGA TPM TPA FTM FTA FGP TPP FTP
<chr> <int> <int> <int> <int> <int> <int> <int> <dbl> <dbl> <dbl>
1 Stephen C… 2016 805 1597 402 887 363 400 0.504 0.453 0.908
2 James Har… 2016 710 1617 236 657 720 837 0.439 0.359 0.860
3 Kevin Dur… 2016 698 1381 186 480 447 498 0.505 0.388 0.898
4 DeMarcus … 2016 601 1332 70 210 476 663 0.451 0.333 0.718
5 LeBron Ja… 2016 737 1416 87 282 359 491 0.520 0.309 0.731
6 Damian Li… 2016 618 1474 229 610 414 464 0.419 0.375 0.892
7 Anthony D… 2016 560 1137 35 108 326 430 0.493 0.324 0.758
8 Russell W… 2016 656 1444 101 341 465 573 0.454 0.296 0.812
9 DeMar DeR… 2016 614 1377 47 139 555 653 0.446 0.338 0.850
10 Paul Geor… 2016 605 1448 210 565 454 528 0.418 0.372 0.860
# ... with 4,827 more rows
What if we wanted to pull out the observations corresponding to the 2015-16 and 2014-15 season. We could do something like filter(raw_shooting, (SEASON == 2016) | (SEASON == 2015))
, which would be perfectly fine. However, what if we wanted data from 1998-99, 2011-12, and 2015-16? Typing a lot of expressions like SEASON == ...
would be rather tedious. The %in%
operator lets us avoid this tedium:
> filter(raw_shooting, SEASON %in% c(1999, 2012, 2016))
# A tibble: 1,150 x 11
PLAYER SEASON FGM FGA TPM TPA FTM FTA FGP TPP FTP
<chr> <int> <int> <int> <int> <int> <int> <int> <dbl> <dbl> <dbl>
1 Stephen C… 2016 805 1597 402 887 363 400 0.504 0.453 0.908
2 James Har… 2016 710 1617 236 657 720 837 0.439 0.359 0.860
3 Kevin Dur… 2016 698 1381 186 480 447 498 0.505 0.388 0.898
4 DeMarcus … 2016 601 1332 70 210 476 663 0.451 0.333 0.718
5 LeBron Ja… 2016 737 1416 87 282 359 491 0.520 0.309 0.731
6 Damian Li… 2016 618 1474 229 610 414 464 0.419 0.375 0.892
7 Anthony D… 2016 560 1137 35 108 326 430 0.493 0.324 0.758
8 Russell W… 2016 656 1444 101 341 465 573 0.454 0.296 0.812
9 DeMar DeR… 2016 614 1377 47 139 555 653 0.446 0.338 0.850
10 Paul Geor… 2016 605 1448 210 565 454 528 0.418 0.372 0.860
# ... with 1,140 more rows
We could also filter out data from the two lockout-shortened seasons, 1998-99 and 2011-12 using a combination of the NOT !
operator and %in%
.
> filter(raw_shooting, !SEASON %in% c(1999, 2012))
# A tibble: 6,721 x 11
PLAYER SEASON FGM FGA TPM TPA FTM FTA FGP TPP FTP
<chr> <int> <int> <int> <int> <int> <int> <int> <dbl> <dbl> <dbl>
1 Stephen C… 2016 805 1597 402 887 363 400 0.504 0.453 0.908
2 James Har… 2016 710 1617 236 657 720 837 0.439 0.359 0.860
3 Kevin Dur… 2016 698 1381 186 480 447 498 0.505 0.388 0.898
4 DeMarcus … 2016 601 1332 70 210 476 663 0.451 0.333 0.718
5 LeBron Ja… 2016 737 1416 87 282 359 491 0.520 0.309 0.731
6 Damian Li… 2016 618 1474 229 610 414 464 0.419 0.375 0.892
7 Anthony D… 2016 560 1137 35 108 326 430 0.493 0.324 0.758
8 Russell W… 2016 656 1444 101 341 465 573 0.454 0.296 0.812
9 DeMar DeR… 2016 614 1377 47 139 555 653 0.446 0.338 0.850
10 Paul Geor… 2016 605 1448 210 565 454 528 0.418 0.372 0.860
# ... with 6,711 more rows
For the remainder of this module, we will focus on the players who attempted at least 100 field goals, 100 free throws, 50 three pointers in the non-lockout seasons.
> nba_shooting_orig <- filter(raw_shooting, FGA >= 100 & FTA >= 100 & TPA >= 50 &
+ !SEASON %in% c(1999, 2012))
> nba_shooting_orig
# A tibble: 2,254 x 11
PLAYER SEASON FGM FGA TPM TPA FTM FTA FGP TPP FTP
<chr> <int> <int> <int> <int> <int> <int> <int> <dbl> <dbl> <dbl>
1 Stephen C… 2016 805 1597 402 887 363 400 0.504 0.453 0.908
2 James Har… 2016 710 1617 236 657 720 837 0.439 0.359 0.860
3 Kevin Dur… 2016 698 1381 186 480 447 498 0.505 0.388 0.898
4 DeMarcus … 2016 601 1332 70 210 476 663 0.451 0.333 0.718
5 LeBron Ja… 2016 737 1416 87 282 359 491 0.520 0.309 0.731
6 Damian Li… 2016 618 1474 229 610 414 464 0.419 0.375 0.892
7 Anthony D… 2016 560 1137 35 108 326 430 0.493 0.324 0.758
8 Russell W… 2016 656 1444 101 341 465 573 0.454 0.296 0.812
9 DeMar DeR… 2016 614 1377 47 139 555 653 0.446 0.338 0.850
10 Paul Geor… 2016 605 1448 210 565 454 528 0.418 0.372 0.860
# ... with 2,244 more rows
In Module 1, we computed effective field goal percentage (eFGP), points scored (PTS), and true shooting percentage (TSP) from vectors containing the number of made and attempted field goals, three pointers, and free throws. Now that we have substantially more data stored in our tbl nba_shooting_orig
, we would like to compute these statistics for all of the players and add new columns for them. We do this with mutate()
.
> mutate(nba_shooting_orig, eFGP = (FGM + 0.5 * TPM)/FGA)
# A tibble: 2,254 x 12
PLAYER SEASON FGM FGA TPM TPA FTM FTA FGP TPP FTP
<chr> <int> <int> <int> <int> <int> <int> <int> <dbl> <dbl> <dbl>
1 Stephen C… 2016 805 1597 402 887 363 400 0.504 0.453 0.908
2 James Har… 2016 710 1617 236 657 720 837 0.439 0.359 0.860
3 Kevin Dur… 2016 698 1381 186 480 447 498 0.505 0.388 0.898
4 DeMarcus … 2016 601 1332 70 210 476 663 0.451 0.333 0.718
5 LeBron Ja… 2016 737 1416 87 282 359 491 0.520 0.309 0.731
6 Damian Li… 2016 618 1474 229 610 414 464 0.419 0.375 0.892
7 Anthony D… 2016 560 1137 35 108 326 430 0.493 0.324 0.758
8 Russell W… 2016 656 1444 101 341 465 573 0.454 0.296 0.812
9 DeMar DeR… 2016 614 1377 47 139 555 653 0.446 0.338 0.850
10 Paul Geor… 2016 605 1448 210 565 454 528 0.418 0.372 0.860
# ... with 2,244 more rows, and 1 more variable: eFGP <dbl>
When we run the code above, we find that R prints out a tbl whose very last column is eFGP. However, if we try to print out nba_shooting_orig
we no longer see this column! This is because dplyr functions never modify their input but work by creating a copy and modifying that. So if we wanted a new tbl that contains eFGP, we need to save it directly:
> nba_shooting_2 <- mutate(nba_shooting_orig, eFGP = (FGM + 0.5 * TPM)/FGA)
> nba_shooting_2
# A tibble: 2,254 x 12
PLAYER SEASON FGM FGA TPM TPA FTM FTA FGP TPP FTP
<chr> <int> <int> <int> <int> <int> <int> <int> <dbl> <dbl> <dbl>
1 Stephen C… 2016 805 1597 402 887 363 400 0.504 0.453 0.908
2 James Har… 2016 710 1617 236 657 720 837 0.439 0.359 0.860
3 Kevin Dur… 2016 698 1381 186 480 447 498 0.505 0.388 0.898
4 DeMarcus … 2016 601 1332 70 210 476 663 0.451 0.333 0.718
5 LeBron Ja… 2016 737 1416 87 282 359 491 0.520 0.309 0.731
6 Damian Li… 2016 618 1474 229 610 414 464 0.419 0.375 0.892
7 Anthony D… 2016 560 1137 35 108 326 430 0.493 0.324 0.758
8 Russell W… 2016 656 1444 101 341 465 573 0.454 0.296 0.812
9 DeMar DeR… 2016 614 1377 47 139 555 653 0.446 0.338 0.850
10 Paul Geor… 2016 605 1448 210 565 454 528 0.418 0.372 0.860
# ... with 2,244 more rows, and 1 more variable: eFGP <dbl>
We now have a new tbl in our environment called nba_shooting_2
and this new tbl now has a column for eFGP. Recall that the formulas for points scored (PTS) and true shooting percentage (TSP): \[
\text{PTS} = \text{FTM} + 2\times \text{FGM} + \text{TPM}
\] \[
\text{TSP} = \frac{\text{PTS}}{2\times(\text{FGA} + 0.44\times \text{FTA})}
\] We can add both of them to our tbl using mutate()
:
> nba_shooting_3 <- mutate(nba_shooting_2, PTS = FTM + 2 * FGM + TPM)
> nba_shooting_4 <- mutate(nba_shooting_3, TSP = PTS/(2 * (FGA + 0.44 * FTA)))
Compared to nba_shooting_orig
, the tbl nba_shooting_4
now has three additional columns for eFGP, PTS, and TPS. In order to create this tbl, we created two intermdiate tbls, nba_shooting_2
and nba_shooting_3
. These are somewhat useless now, since any analyses we would want to do with them could be done using the richer dataset in nba_shooting_4
.
To get rid of these objects, we can use the rm()
function:
> rm(nba_shooting_2, nba_shooting_3)
rm()
works by deleting the objects whose names are specified within the parantheses and separated by parantheses.
If you’re thinking that it was somewhat inefficient to create the two intermediate tbls nba_shooting_2
and nba_shooting_3
in order to arrive at nba_shooting_4
, you’re correct. It turns out that we could have done it all in one shot, as follows:
> nba_shooting <- mutate(nba_shooting_orig,
+ eFGP = (FGM + 0.5*TPM)/FGA,
+ PTS = FTM + 2*FGM + TPM,
+ TSP = PTS/(2 * (FGA + 0.44 * FTA)))
You’ll notice in this code that we have separated each variable we’re creating onto its own line. This helps make the code readable.
When we print both nba_shooting_4
and nba_shooting
, we see that the first ten rows are identical. To verify that the remaining 2,413 rows are identical, we can use R’s identical()
function:
> identical(nba_shooting, nba_shooting_4)
[1] TRUE
So far, we have used mutate()
to compute numeric or continuous variables. Often in an analysis, however, we may want to bin these values into smaller buckets or categories. For instance, we may rather arbitrarily classify players based on their three-point shooting prowess as follows:
In order to add a column to nba_shooting
that includes these classifications, we can use the case_when()
function
> nba_shooting <- mutate(nba_shooting,
+ Classification = case_when(
+ TPP < 0.2 ~ "Hopeless",
+ 0.2 <= TPP & TPP < 0.3 ~ "Below Average",
+ 0.3 <= TPP & TPP < 0.35 ~ "Average",
+ 0.35 <= TPP & TPP < 0.4 ~ "Above Average",
+ 0.4 <= TPP ~ "Elite"))
Let’s take a minute to unpack the code above. Within mutate()
, we have started like we always did, with the name of the new variable on the left hand side of an equal sign. Then we called the case_when()
function. Within this function, we have a new line for each of the values of the new variable Classification''. On each line we have an expression with a twiddle (`~`). On the left of the `~`, we have put a logical expression and on the right we have written the value of
Clasification’’.
Among eligible players, what was the average field goal percentage in the 2015-16 season? To answer this, we can use filter()
to create a new tbl containing the data only for this season. Then we can use the dplyr verb summarize()
as follows:
> nba_shooting_2016 <- filter(nba_shooting, SEASON == 2016)
> summarize(nba_shooting_2016, FGP = mean(FGP))
# A tibble: 1 x 1
FGP
<dbl>
1 0.438
> summarize(nba_shooting_2016, FGP = mean(FGP), TPP = mean(TPP), FTP = mean(FTP))
# A tibble: 1 x 3
FGP TPP FTP
<dbl> <dbl> <dbl>
1 0.438 0.346 0.790
In the first example, we compute the average field goal percentage and in the second example, we compute the average field goal, three point, and free throw percentages. Of course, we are not limited to computing just the mean. The following functions are quite useful for summarizing several aspects of the distribution of the variables in our dataset:
mean()
, median()
sd()
, IQR()
min()
, max()
n()
, n_distinct()
We will have much more to say about summarize()
in Module 4 when we discuss grouped manipulations.
Oftentimes, the dataset you load into R contains many, many more columns than you need. We can use select()
to pull out the columns we want to use in our subsequent analyses. For instance, we may want to only focus on the columns SEASON, FGP, TPP, FTP, eFGP, PTS, and TSP and ignore the rest of the columns.
> select(nba_shooting, PLAYER, SEASON, FGP, TPP, FTP, eFGP, PTS, TSP)
# A tibble: 2,254 x 8
PLAYER SEASON FGP TPP FTP eFGP PTS TSP
<chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Stephen Curry 2016 0.504 0.453 0.908 0.630 2375 0.670
2 James Harden 2016 0.439 0.359 0.860 0.512 2376 0.598
3 Kevin Durant 2016 0.505 0.388 0.898 0.573 2029 0.634
4 DeMarcus Cousins 2016 0.451 0.333 0.718 0.477 1748 0.538
5 LeBron James 2016 0.520 0.309 0.731 0.551 1920 0.588
6 Damian Lillard 2016 0.419 0.375 0.892 0.497 1879 0.560
7 Anthony Davis 2016 0.493 0.324 0.758 0.508 1481 0.558
8 Russell Westbrook 2016 0.454 0.296 0.812 0.489 1878 0.554
9 DeMar DeRozan 2016 0.446 0.338 0.850 0.463 1830 0.550
10 Paul George 2016 0.418 0.372 0.860 0.490 1874 0.558
# ... with 2,244 more rows
By this point, the nba_shooting
tbl has much more information in it than the original data file we read in. While we can always re-run the commands used to produce this tbl from our script, when data analyses become more complicated, it is helpful to save these objects. R has its own special file format for efficiently saving data on your computer.
We will use the save()
command.
> save(nba_shooting, file = "data/nba_shooting.RData")
When we want to load the data back into R, we can use the load()
function
> load("data/nba_shooting.RData")
Up to this point, we have only used the dplyr verbs mutate()
, filter()
, and arrange()
one at a time. What if we wanted to do something a bit more complicated like:
case_when()
)Using what we have already learned, you could accomplish steps 1 – 3 by creating lots of temporary tbls. In Module 4, we will learn how to string together several dplyr verbs to perform the above tasks without having to create temporary tbls. We will also learn how to perform grouped calculations.