ModernDive

3.1 The pipe operator: %>%

Before we start data wrangling, let’s first introduce a nifty tool that gets loaded with the dplyr package: the pipe operator %>%. The pipe operator allows us to combine multiple operations in R into a single sequential chain of actions.

Let’s start with a hypothetical example. Say you would like to perform a hypothetical sequence of operations on a hypothetical data frame x using hypothetical functions f(), g(), and h():

  1. Take x then
  2. Use x as an input to a function f() then
  3. Use the output of f(x) as an input to a function g() then
  4. Use the output of g(f(x)) as an input to a function h()

One way to achieve this sequence of operations is by using nesting parentheses as follows:

This code isn’t so hard to read since we are applying only three functions: f(), then g(), then h() and each of the functions is short in its name. Further, each of these functions also only has one argument. However, you can imagine that this will get progressively harder to read as the number of functions applied in your sequence increases and the arguments in each function increase as well. This is where the pipe operator %>% comes in handy. %>% takes the output of one function and then “pipes” it to be the input of the next function. Furthermore, a helpful trick is to read %>% as “then” or “and then.” For example, you can obtain the same output as the hypothetical sequence of functions as follows:

You would read this sequence as:

  1. Take x then
  2. Use this output as the input to the next function f() then
  3. Use this output as the input to the next function g() then
  4. Use this output as the input to the next function h()

So while both approaches achieve the same goal, the latter is much more human-readable because you can clearly read the sequence of operations line-by-line. But what are the hypothetical x, f(), g(), and h()? Throughout this chapter on data wrangling:

  1. The starting value x will be a data frame. For example, the flights data frame we explored in Section 1.4.
  2. The sequence of functions, here f(), g(), and h(), will mostly be a sequence of any number of the six data wrangling verb-named functions we listed in the introduction to this chapter. For example, the filter(carrier == "AS") function and argument specified we previewed earlier.
  3. The result will be the transformed/modified data frame that you want. In our example, we’ll save the result in a new data frame by using the <- assignment operator with the name alaska_flights via alaska_flights <-.

Much like when adding layers to a ggplot() using the + sign, you form a single chain of data wrangling operations by combining verb-named functions into a single sequence using the pipe operator %>%. Furthermore, much like how the + sign has to come at the end of lines when constructing plots, the pipe operator %>% has to come at the end of lines as well.

Keep in mind, there are many more advanced data wrangling functions than just the six listed in the introduction to this chapter; you’ll see some examples of these in Section 3.8. However, just with these six verb-named functions you’ll be able to perform a broad array of data wrangling tasks for the rest of this book.