ModernDive

10.2 Interpreting regression tables

We’ve so far focused only on the two leftmost columns of the regression table in Table 10.1: term and estimate. Let’s now shift our attention to the remaining columns: std_error, statistic, p_value, lower_ci and upper_ci in Table 10.3.

TABLE 10.3: Previously seen regression table
term estimate std_error statistic p_value lower_ci upper_ci
intercept 3.880 0.076 50.96 0 3.731 4.030
bty_avg 0.067 0.016 4.09 0 0.035 0.099

Given the lack of practical interpretation for the fitted intercept \(b_0\), in this section we’ll focus only on the second row of the table corresponding to the fitted slope \(b_1\). We’ll first interpret the std_error, statistic, p_value, lower_ci and upper_ci columns. Afterwards in the upcoming Subsection 10.2.5, we’ll discuss how R computes these values.

10.2.1 Standard error

The third column of the regression table in Table 10.1 std_error corresponds to the standard error of our estimates. Recall the definition of standard error we saw in Subsection 7.3.2:

The standard error is the standard deviation of any point estimate computed from a sample.

So what does this mean in terms of the fitted slope \(b_1\) = 0.067? This value is just one possible value of the fitted slope resulting from this particular sample of \(n\) = 463 pairs of teaching and beauty scores. However, if we collected a different sample of \(n\) = 463 pairs of teaching and beauty scores, we will almost certainly obtain a different fitted slope \(b_1\). This is due to sampling variability.

Say we hypothetically collected 1000 such samples of pairs of teaching and beauty scores, computed the 1000 resulting values of the fitted slope \(b_1\), and visualized them in a histogram. This would be a visualization of the sampling distribution of \(b_1\), which we defined in Subsection 7.3.2. Further recall that the standard deviation of the sampling distribution of \(b_1\) has a special name: the standard error.

Recall that we constructed three sampling distributions for the sample proportion \(\widehat{p}\) using shovels of size 25, 50, and 100 in Figure 7.12. We observed that as the sample size increased, the standard error decreased as evidenced by the narrowing sampling distribution.

The standard error of \(b_1\) similarly quantifies how much variation in the fitted slope \(b_1\) one would expect between different samples. So in our case, we can expect about 0.016 units of variation in the bty_avg slope variable. Recall that the estimate and std_error values play a key role in inferring the value of the unknown population slope \(\beta_1\) relating to all instructors.

In Section 10.4, we’ll perform a simulation using the infer package to construct the bootstrap distribution for \(b_1\) in this case. Recall from Subsection 8.7.1 that the bootstrap distribution is an approximation to the sampling distribution in that they have a similar shape. Since they have a similar shape, they have similar standard errors. However, unlike the sampling distribution, the bootstrap distribution is constructed from a single sample, which is a practice more aligned with what’s done in real life.

10.2.2 Test statistic

The fourth column of the regression table in Table 10.1 statistic corresponds to a test statistic relating to the following hypothesis test:

\[ \begin{aligned} H_0 &: \beta_1 = 0\\ \text{vs } H_A&: \beta_1 \neq 0. \end{aligned} \]

Recall our terminology, notation, and definitions related to hypothesis tests we introduced in Section 9.2.

A hypothesis test consists of a test between two competing hypotheses: (1) a null hypothesis \(H_0\) versus (2) an alternative hypothesis \(H_A\).

A test statistic is a point estimate/sample statistic formula used for hypothesis testing.

Here, our null hypothesis \(H_0\) assumes that the population slope \(\beta_1\) is 0. If the population slope \(\beta_1\) is truly 0, then this is saying that there is no true relationship between teaching and “beauty” scores for all the instructors in our population. In other words, \(x\) = “beauty” score would have no associated effect on \(y\) = teaching score. The alternative hypothesis \(H_A\), on the other hand, assumes that the population slope \(\beta_1\) is not 0, meaning it could be either positive or negative. This suggests either a positive or negative relationship between teaching and “beauty” scores. Recall we called such alternative hypotheses two-sided. By convention, all hypothesis testing for regression assumes two-sided alternatives.

Recall our “hypothesized universe” of no gender discrimination we assumed in our promotions activity in Section 9.1. Similarly here when conducting this hypothesis test, we’ll assume a “hypothesized universe” where there is no relationship between teaching and “beauty” scores. In other words, we’ll assume the null hypothesis \(H_0: \beta_1 = 0\) is true.

The statistic column in the regression table is a tricky one, however. It corresponds to a standardized t-test statistic, much like the two-sample \(t\) statistic we saw in Subsection 9.6.1 where we used a theory-based method for conducting hypothesis tests. In both these cases, the null distribution can be mathematically proven to be a \(t\)-distribution. Since such test statistics are tricky for individuals new to statistical inference to study, we’ll skip this and jump into interpreting the \(p\)-value. If you’re curious, we have included a discussion of this standardized t-test statistic in Subsection 10.5.1.

10.2.3 p-value

The fifth column of the regression table in Table 10.1 p_value corresponds to the p-value of the hypothesis test \(H_0: \beta_1 = 0\) versus \(H_A: \beta_1 \neq 0\).

Again recalling our terminology, notation, and definitions related to hypothesis tests we introduced in Section 9.2, let’s focus on the definition of the \(p\)-value:

A p-value is the probability of obtaining a test statistic just as extreme or more extreme than the observed test statistic assuming the null hypothesis \(H_0\) is true.

Recall that you can intuitively think of the \(p\)-value as quantifying how “extreme” the observed fitted slope of \(b_1\) = 0.067 is in a “hypothesized universe” where there is no relationship between teaching and “beauty” scores.

Following the hypothesis testing procedure we outlined in Section 9.4, since the \(p\)-value in this case is 0, for any choice of significance level \(\alpha\) we would reject \(H_0\) in favor of \(H_A\). Using non-statistical language, this is saying: we reject the hypothesis that there is no relationship between teaching and “beauty” scores in favor of the hypothesis that there is. That is to say, the evidence suggests there is a significant relationship, one that is positive.

More precisely, however, the \(p\)-value corresponds to how extreme the observed test statistic of 4.09 is when compared to the appropriate null distribution. In Section 10.4, we’ll perform a simulation using the infer package to construct the null distribution in this case.

An extra caveat here is that the results of this hypothesis test are only valid if certain “conditions for inference for regression” are met, which we’ll introduce shortly in Section 10.3.

10.2.4 Confidence interval

The two rightmost columns of the regression table in Table 10.1 (lower_ci and upper_ci) correspond to the endpoints of the 95% confidence interval for the population slope \(\beta_1\). Recall our analogy of “nets are to fish” what “confidence intervals are to population parameters” from Section 8.3. The resulting 95% confidence interval for \(\beta_1\) of (0.035, 0.099) can be thought of as a range of plausible values for the population slope \(\beta_1\) of the linear relationship between teaching and “beauty” scores.

As we introduced in Subsection 8.5.2 on the precise and shorthand interpretation of confidence intervals, the statistically precise interpretation of this confidence interval is: “if we repeated this sampling procedure a large number of times, we expect about 95% of the resulting confidence intervals to capture the value of the population slope \(\beta_1\).” However, we’ll summarize this using our shorthand interpretation that “we’re 95% ‘confident’ that the true population slope \(\beta_1\) lies between 0.035 and 0.099.”

Notice in this case that the resulting 95% confidence interval for \(\beta_1\) of \((0.035, \, 0.099)\) does not contain a very particular value: \(\beta_1\) equals 0. Recall we mentioned that if the population regression slope \(\beta_1\) is 0, this is equivalent to saying there is no relationship between teaching and “beauty” scores. Since \(\beta_1\) = 0 is not in our plausible range of values for \(\beta_1\), we are inclined to believe that there, in fact, is a relationship between teaching and “beauty” scores and a positive one at that. So in this case, the conclusion about the population slope \(\beta_1\) from the 95% confidence interval matches the conclusion from the hypothesis test: evidence suggests that there is a meaningful relationship between teaching and “beauty” scores.

Recall from Subsection 8.5.3, however, that the confidence level is one of many factors that determine confidence interval widths. So for example, say we used a higher confidence level of 99% instead of 95%. The resulting confidence interval for \(\beta_1\) would be wider and thus might now include 0. The lesson to remember here is that any confidence-interval-based conclusion depends highly on the confidence level used.

What are the calculations that went into computing the two endpoints of the 95% confidence interval for \(\beta_1\)?

Recall our sampling bowl example from Subsection 8.7.2 discussing lower_ci and upper_ci. Since the sampling and bootstrap distributions of the sample proportion \(\widehat{p}\) were roughly normal, we could use the rule of thumb for bell-shaped distributions from Appendix A.2 to create a 95% confidence interval for \(p\) with the following equation:

\[\widehat{p} \pm \text{MoE}_{\widehat{p}} = \widehat{p} \pm 1.96 \cdot \text{SE}_{\widehat{p}} = \widehat{p} \pm 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\]

We can generalize this to other point estimates that have roughly normally shaped sampling and/or bootstrap distributions:

\[\text{point estimate} \pm \text{MoE} = \text{point estimate} \pm 1.96 \cdot \text{SE}.\]

We’ll show in Section 10.4 that the sampling/bootstrap distribution for the fitted slope \(b_1\) is in fact bell-shaped as well. Thus we can construct a 95% confidence interval for \(\beta_1\) with the following equation:

\[b_1 \pm \text{MoE}_{b_1} = b_1 \pm 1.96 \cdot \text{SE}_{b_1}.\]

What is the value of the standard error \(\text{SE}_{b_1}\)? It is in fact in the third column of the regression table in Table 10.1: 0.016. Thus

\[ \begin{aligned} b_1 \pm 1.96 \cdot \text{SE}_{b_1} &= 0.067 \pm 1.96 \cdot 0.016 = 0.067 \pm 0.031\\ &= (0.036, 0.098) \end{aligned} \]

This closely matches the \((0.035, 0.099)\) confidence interval in the last two columns of Table 10.1.

Much like hypothesis tests, however, the results of this confidence interval also are only valid if the “conditions for inference for regression” to be discussed in Section 10.3 are met.

10.2.5 How does R compute the table?

Since we didn’t perform the simulation to get the values of the standard error, test statistic, \(p\)-value, and endpoints of the 95% confidence interval in Table 10.1, you might be wondering how were these values computed. What did R do behind the scenes? Does R run simulations like we did using the infer package in Chapters 8 and 9 on confidence intervals and hypothesis testing?

The answer is no! Much like the theory-based method for constructing confidence intervals you saw in Subsection 8.7.2 and the theory-based hypothesis test you saw in Subsection 9.6.1, there exist mathematical formulas that allow you to construct confidence intervals and conduct hypothesis tests for inference for regression. These formulas were derived in a time when computers didn’t exist, so it would’ve been impossible to run the extensive computer simulations we have in this book. We present these formulas in Subsection 10.5.1 on “theory-based inference for regression.”

In Section 10.4, we’ll go over a simulation-based approach to constructing confidence intervals and conducting hypothesis tests using the infer package. In particular, we’ll convince you that the bootstrap distribution of the fitted slope \(b_1\) is indeed bell-shaped.