ModernDive

D.6 Chapter 6 Solutions

(LC6.1) Compute the observed values, fitted values, and residuals not for the interaction model as we just did, but rather for the parallel slopes model we saved in score_model_parallel_slopes.

Solution:

# A tibble: 463 x 6
      ID score   age gender score_hat  residual
   <int> <dbl> <int> <fct>      <dbl>     <dbl>
 1     1 4.7      36 female   4.172    0.528   
 2     2 4.100    36 female   4.172   -0.072000
 3     3 3.9      36 female   4.172   -0.272   
 4     4 4.8      36 female   4.172    0.628   
 5     5 4.600    59 male     4.163    0.437   
 6     6 4.3      59 male     4.163    0.137   
 7     7 2.8      59 male     4.163   -1.363   
 8     8 4.100    51 male     4.232   -0.132   
 9     9 3.4      51 male     4.232   -0.832   
10    10 4.5      40 female   4.13700  0.363   
# … with 453 more rows

(LC6.2) Conduct a new exploratory data analysis with the same outcome variable \(y\) being debt but with credit_rating and age as the new explanatory variables \(x_1\) and \(x_2\). Remember, this involves three things:

  1. Most crucially: Looking at the raw data values.
  2. Computing summary statistics, such as means, medians, and interquartile ranges.
  3. Creating data visualizations.

What can you say about the relationship between a credit card holder’s debt and their credit rating and age?

Solution:

  • Most crucially: Looking at the raw data values.
# A tibble: 6 x 3
   debt credit_rating   age
  <int>         <int> <int>
1   333           283    34
2   903           483    82
3   580           514    71
4   964           681    36
5   331           357    68
6  1151           569    77
  • Computing summary statistics, such as means, medians, and interquartile ranges.
Skim summary statistics
 n obs: 400 
 n variables: 3 
 group variables:  

── Variable type:integer ───────────────────────────────────────────────────────
      variable missing complete   n   mean     sd p0    p25   p50    p75 p100
           age       0      400 400  55.67  17.25 23  41.75  56    70      98
 credit_rating       0      400 400 354.94 154.72 93 247.25 344   437.25  982
          debt       0      400 400 520.01 459.76  0  68.75 459.5 863    1999
  • Creating data visualizations.

It seems that there is a positive relationship between one’s credit rating and their debt, and very little relationship between one’s age and their debt.

(LC6.3) Fit a new simple linear regression using lm(debt ~ credit_rating + age, data = credit_ch6) where credit_rating and age are the new numerical explanatory variables \(x_1\) and \(x_2\). Get information about the “best-fitting” regression plane from the regression table by applying the get_regression_table() function. How do the regression results match up with the results from your previous exploratory data analysis?

# A tibble: 3 x 7
  term          estimate std_error statistic p_value lower_ci upper_ci
  <chr>            <dbl>     <dbl>     <dbl>   <dbl>    <dbl>    <dbl>
1 intercept     -269.581    44.806    -6.017       0 -357.668 -181.494
2 credit_rating    2.593     0.074    34.84        0    2.447    2.74 
3 age             -2.351     0.668    -3.521       0   -3.663   -1.038

The coefficients for both new numerical explanatory variables \(x_1\) and \(x_2\), credit_rating and age, are \(2.59\) and \(-2.35\) respectively, which means that debt and credit_rating are positively correlated, and debt and age are negatively correlated. This matches up with the results from your previous exploratory data analysis.