## D.1 Chapter 1 Solutions

library(dplyr)
library(ggplot2)
library(nycflights13)

(LC1.1) Repeat the above installing steps, but for the dplyr, nycflights13, and knitr packages. This will install the earlier mentioned dplyr package, the nycflights13 package containing data on all domestic flights leaving a NYC airport in 2013, and the knitr package for writing reports in R.

(LC1.2) “Load” the dplyr, nycflights13, and knitr packages as well by repeating the above steps.

Solution: If the following code runs with no errors, you’ve succeeded!

library(dplyr)
library(nycflights13)
library(knitr)

(LC1.3) What does any ONE row in this flights dataset refer to?

• A. Data on an airline
• B. Data on a flight
• C. Data on an airport
• D. Data on multiple flights

Solution: This is data on a flight. Not a flight path! Example:

• a flight path would be United 1545 to Houston
• a flight would be United 1545 to Houston at a specific date/time. For example: 2013/1/1 at 5:15am.

(LC1.4) What are some examples in this dataset of categorical variables? What makes them different than quantitative variables?

Solution: Hint: Type ?flights in the console to see what all the variables mean!

• Categorical:
• carrier the company
• dest the destination
• flight the flight number. Even though this is a number, its simply a label. Example United 1545 is not less than United 1714
• Quantitative:
• distance the distance in miles
• time_hour time

(LC1.5) What properties of the observational unit do each of lat, lon, alt, tz, dst, and tzone describe for the airports data frame? Note that you may want to use ?airports to get more information.

Solution: lat long represent the airport geographic coordinates, alt is the altitude above sea level of the airport (Run airports %>% filter(faa == "DEN") to see the altitude of Denver International Airport), tz is the time zone difference with respect to GMT in London UK, dst is the daylight savings time zone, and tzone is the time zone label.

(LC1.6) Provide the names of variables in a data frame with at least three variables in which one of them is an identification variable and the other two are not. In other words, create your own tidy dataset that matches these conditions.

Solution:

• In the weather example in LC2.8, the combination of origin, year, month, day, hour are identification variables as they identify the observation in question.
• Anything else pertains to observations: temp, humid, wind_speed, etc.