## D.6 Chapter 6 Solutions

library(tidyverse)
library(moderndive)
library(skimr)
library(ISLR)

(LC6.1) Compute the observed values, fitted values, and residuals not for the interaction model as we just did, but rather for the parallel slopes model we saved in score_model_parallel_slopes.

Solution:

regression_points_parallel <- get_regression_points(score_model_parallel_slopes)
regression_points_parallel
# A tibble: 463 x 6
ID score   age gender score_hat  residual
<int> <dbl> <int> <fct>      <dbl>     <dbl>
1     1 4.7      36 female   4.172    0.528
2     2 4.100    36 female   4.172   -0.072000
3     3 3.9      36 female   4.172   -0.272
4     4 4.8      36 female   4.172    0.628
5     5 4.600    59 male     4.163    0.437
6     6 4.3      59 male     4.163    0.137
7     7 2.8      59 male     4.163   -1.363
8     8 4.100    51 male     4.232   -0.132
9     9 3.4      51 male     4.232   -0.832
10    10 4.5      40 female   4.13700  0.363
# … with 453 more rows

(LC6.2) Conduct a new exploratory data analysis with the same outcome variable $$y$$ being debt but with credit_rating and age as the new explanatory variables $$x_1$$ and $$x_2$$. Remember, this involves three things:

1. Most crucially: Looking at the raw data values.
2. Computing summary statistics, such as means, medians, and interquartile ranges.
3. Creating data visualizations.

What can you say about the relationship between a credit card holder’s debt and their credit rating and age?

Solution:

• Most crucially: Looking at the raw data values.
credit_ch6 %>%
select(debt, credit_rating, age) %>%
head()
# A tibble: 6 x 3
debt credit_rating   age
<int>         <int> <int>
1   333           283    34
2   903           483    82
3   580           514    71
4   964           681    36
5   331           357    68
6  1151           569    77
• Computing summary statistics, such as means, medians, and interquartile ranges.
skim_with(numeric = list(hist = NULL), integer = list(hist = NULL))
credit_ch6 %>%
select(debt, credit_rating, age) %>%
skim()
Skim summary statistics
n obs: 400
n variables: 3
group variables:

── Variable type:integer ───────────────────────────────────────────────────────
variable missing complete   n   mean     sd p0    p25   p50    p75 p100
age       0      400 400  55.67  17.25 23  41.75  56    70      98
credit_rating       0      400 400 354.94 154.72 93 247.25 344   437.25  982
debt       0      400 400 520.01 459.76  0  68.75 459.5 863    1999
• Creating data visualizations.
ggplot(credit_ch6, aes(x = credit_rating, y = debt)) +
geom_point() +
labs(
x = "Credit rating", y = "Credit card debt (in $)", title = "Debt and credit rating" ) + geom_smooth(method = "lm", se = FALSE) ggplot(credit_ch6, aes(x = age, y = debt)) + geom_point() + labs( x = "Age (in year)", y = "Credit card debt (in$)",
title = "Debt and age"
) +
geom_smooth(method = "lm", se = FALSE)

It seems that there is a positive relationship between one’s credit rating and their debt, and very little relationship between one’s age and their debt.

(LC6.3) Fit a new simple linear regression using lm(debt ~ credit_rating + age, data = credit_ch6) where credit_rating and age are the new numerical explanatory variables $$x_1$$ and $$x_2$$. Get information about the “best-fitting” regression plane from the regression table by applying the get_regression_table() function. How do the regression results match up with the results from your previous exploratory data analysis?

# Fit regression model:
debt_model_2 <- lm(debt ~ credit_rating + age, data = credit_ch6)
# Get regression table:
get_regression_table(debt_model_2)
# A tibble: 3 x 7
term          estimate std_error statistic p_value lower_ci upper_ci
<chr>            <dbl>     <dbl>     <dbl>   <dbl>    <dbl>    <dbl>
1 intercept     -269.581    44.806    -6.017       0 -357.668 -181.494
2 credit_rating    2.593     0.074    34.84        0    2.447    2.74
3 age             -2.351     0.668    -3.521       0   -3.663   -1.038

The coefficients for both new numerical explanatory variables $$x_1$$ and $$x_2$$, credit_rating and age, are $$2.59$$ and $$-2.35$$ respectively, which means that debt and credit_rating are positively correlated, and debt and age are negatively correlated. This matches up with the results from your previous exploratory data analysis.