## 10.2 Interpreting regression tables

We’ve so far focused only on the two leftmost columns of the regression table in Table 10.1: `term`

and `estimate`

. Let’s now shift our attention to the remaining columns: `std_error`

, `statistic`

, `p_value`

, `lower_ci`

and `upper_ci`

in Table 10.3.

term | estimate | std_error | statistic | p_value | lower_ci | upper_ci |
---|---|---|---|---|---|---|

intercept | 3.880 | 0.076 | 50.96 | 0 | 3.731 | 4.030 |

bty_avg | 0.067 | 0.016 | 4.09 | 0 | 0.035 | 0.099 |

Given the lack of practical interpretation for the fitted intercept \(b_0\), in this section we’ll focus only on the second row of the table corresponding to the fitted slope \(b_1\). We’ll first interpret the `std_error`

, `statistic`

, `p_value`

, `lower_ci`

and `upper_ci`

columns. Afterwards in the upcoming Subsection 10.2.5, we’ll discuss how R computes these values.

### 10.2.1 Standard error

The third column of the regression table in Table 10.1 `std_error`

corresponds to the *standard error* of our estimates. Recall the definition of **standard error** we saw in Subsection 7.3.2:

The

standard erroris the standard deviation of any point estimate computed from a sample.

So what does this mean in terms of the fitted slope \(b_1\) = 0.067? This value is just one possible value of the fitted slope resulting from *this particular sample* of \(n\) = 463 pairs of teaching and beauty scores. However, if we collected a different sample of \(n\) = 463 pairs of teaching and beauty scores, we will almost certainly obtain a different fitted slope \(b_1\). This is due to *sampling variability*.

Say we hypothetically collected 1000 such samples of pairs of teaching and beauty scores, computed the 1000 resulting values of the fitted slope \(b_1\), and visualized them in a histogram. This would be a visualization of the *sampling distribution* of \(b_1\), which we defined in Subsection 7.3.2. Further recall that the standard deviation of the *sampling distribution* of \(b_1\) has a special name: the *standard error*.

Recall that we constructed three sampling distributions for the sample proportion \(\widehat{p}\) using shovels of size 25, 50, and 100 in Figure 7.12. We observed that as the sample size increased, the standard error decreased as evidenced by the narrowing sampling distribution.

The *standard error* of \(b_1\) similarly quantifies how much variation in the fitted slope \(b_1\) one would expect between different samples. So in our case, we can expect about 0.016 units of variation in the `bty_avg`

slope variable. Recall that the `estimate`

and `std_error`

values play a key role in *inferring* the value of the unknown population slope \(\beta_1\) relating to *all* instructors.

In Section 10.4, we’ll perform a simulation using the `infer`

package to construct the bootstrap distribution for \(b_1\) in this case. Recall from Subsection 8.7.1 that the bootstrap distribution is an *approximation* to the sampling distribution in that they have a similar shape. Since they have a similar shape, they have similar *standard errors*. However, unlike the sampling distribution, the bootstrap distribution is constructed from a *single* sample, which is a practice more aligned with what’s done in real life.

### 10.2.2 Test statistic

The fourth column of the regression table in Table 10.1 `statistic`

corresponds to a *test statistic* relating to the following *hypothesis test*:

\[ \begin{aligned} H_0 &: \beta_1 = 0\\ \text{vs } H_A&: \beta_1 \neq 0. \end{aligned} \]

Recall our terminology, notation, and definitions related to hypothesis tests we introduced in Section 9.2.

A

hypothesis testconsists of a test between two competing hypotheses: (1) anull hypothesis\(H_0\) versus (2) analternative hypothesis\(H_A\).A

test statisticis a point estimate/sample statistic formula used for hypothesis testing.

Here, our *null hypothesis* \(H_0\) assumes that the population slope \(\beta_1\) is 0. If the population slope \(\beta_1\) is truly 0, then this is saying that there is *no true relationship* between teaching and “beauty” scores for *all* the instructors in our population. In other words, \(x\) = “beauty” score would have no associated effect on \(y\) = teaching score.
The *alternative hypothesis* \(H_A\), on the other hand, assumes that the population slope \(\beta_1\) is not 0, meaning it could be either positive or negative. This suggests either a positive or negative relationship between teaching and “beauty” scores. Recall we called such alternative hypotheses *two-sided*. By convention, all hypothesis testing for regression assumes two-sided alternatives.

Recall our “hypothesized universe” of no gender discrimination we *assumed* in our `promotions`

activity in Section 9.1. Similarly here when conducting this hypothesis test, we’ll assume a “hypothesized universe” where there is no relationship between teaching and “beauty” scores. In other words, we’ll assume the null hypothesis \(H_0: \beta_1 = 0\) is true.

The `statistic`

column in the regression table is a tricky one, however. It corresponds to a standardized *t-test statistic*, much like the *two-sample \(t\) statistic* we saw in Subsection 9.6.1 where we used a theory-based method for conducting hypothesis tests. In both these cases, the *null distribution* can be mathematically proven to be a *\(t\)-distribution*. Since such test statistics are tricky for individuals new to statistical inference to study, we’ll skip this and jump into interpreting the \(p\)-value. If you’re curious, we have included a discussion of this standardized *t-test statistic* in Subsection 10.5.1.

### 10.2.3 p-value

The fifth column of the regression table in Table 10.1 `p_value`

corresponds to the *p-value* of the hypothesis test \(H_0: \beta_1 = 0\) versus \(H_A: \beta_1 \neq 0\).

Again recalling our terminology, notation, and definitions related to hypothesis tests we introduced in Section 9.2, let’s focus on the definition of the \(p\)-value:

A

p-valueis the probability of obtaining a test statistic just as extreme or more extreme than the observed test statisticassuming the null hypothesis \(H_0\) is true.

Recall that you can intuitively think of the \(p\)-value as quantifying how “extreme” the observed fitted slope of \(b_1\) = 0.067 is in a “hypothesized universe” where there is no relationship between teaching and “beauty” scores.

Following the hypothesis testing procedure we outlined in Section 9.4, since the \(p\)-value in this case is 0, for any choice of significance level \(\alpha\) we would reject \(H_0\) in favor of \(H_A\). Using non-statistical language, this is saying: we reject the hypothesis that there is no relationship between teaching and “beauty” scores in favor of the hypothesis that there is. That is to say, the evidence suggests there is a significant relationship, one that is positive.

More precisely, however, the \(p\)-value corresponds to how extreme the observed test statistic of 4.09 is when compared to the appropriate *null distribution*. In Section 10.4, we’ll perform a simulation using the `infer`

package to construct the null distribution in this case.

An extra caveat here is that the results of this hypothesis test are only valid if certain “conditions for inference for regression” are met, which we’ll introduce shortly in Section 10.3.

### 10.2.4 Confidence interval

The two rightmost columns of the regression table in Table 10.1 (`lower_ci`

and `upper_ci`

) correspond to the endpoints of the 95% *confidence interval* for the population slope \(\beta_1\). Recall our analogy of “nets are to fish” what “confidence intervals are to population parameters” from Section 8.3. The resulting 95% confidence interval for \(\beta_1\) of (0.035, 0.099) can be thought of as a range of plausible values for the population slope \(\beta_1\) of the linear relationship between teaching and “beauty” scores.

As we introduced in Subsection 8.5.2 on the precise and shorthand interpretation of confidence intervals, the statistically precise interpretation of this confidence interval is: “if we repeated this sampling procedure a large number of times, we expect about 95% of the resulting confidence intervals to capture the value of the population slope \(\beta_1\).” However, we’ll summarize this using our shorthand interpretation that “we’re 95% ‘confident’ that the true population slope \(\beta_1\) lies between 0.035 and 0.099.”

Notice in this case that the resulting 95% confidence interval for \(\beta_1\) of \((0.035, \, 0.099)\) does not contain a very particular value: \(\beta_1\) equals 0. Recall we mentioned that if the population regression slope \(\beta_1\) is 0, this is equivalent to saying there is *no* relationship between teaching and “beauty” scores. Since \(\beta_1\) = 0 is not in our plausible range of values for \(\beta_1\), we are inclined to believe that there, in fact, *is* a relationship between teaching and “beauty” scores and a positive one at that. So in this case, the conclusion about the population slope \(\beta_1\) from the 95% confidence interval matches the conclusion from the hypothesis test: evidence suggests that there is a meaningful relationship between teaching and “beauty” scores.

Recall from Subsection 8.5.3, however, that the *confidence level* is one of many factors that determine confidence interval widths. So for example, say we used a higher confidence level of 99% instead of 95%. The resulting confidence interval for \(\beta_1\) would be wider and thus might now include 0. The lesson to remember here is that any confidence-interval-based conclusion depends highly on the confidence level used.

What are the calculations that went into computing the two endpoints of the 95% confidence interval for \(\beta_1\)?

Recall our sampling bowl example from Subsection 8.7.2 discussing `lower_ci`

and `upper_ci`

. Since the sampling and bootstrap distributions of the sample proportion \(\widehat{p}\) were roughly normal, we could use the rule of thumb for bell-shaped distributions from Appendix A.2 to create a 95% confidence interval for \(p\) with the following equation:

\[\widehat{p} \pm \text{MoE}_{\widehat{p}} = \widehat{p} \pm 1.96 \cdot \text{SE}_{\widehat{p}} = \widehat{p} \pm 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\]

We can generalize this to other point estimates that have roughly normally shaped sampling and/or bootstrap distributions:

\[\text{point estimate} \pm \text{MoE} = \text{point estimate} \pm 1.96 \cdot \text{SE}.\]

We’ll show in Section 10.4 that the sampling/bootstrap distribution for the fitted slope \(b_1\) is in fact bell-shaped as well. Thus we can construct a 95% confidence interval for \(\beta_1\) with the following equation:

\[b_1 \pm \text{MoE}_{b_1} = b_1 \pm 1.96 \cdot \text{SE}_{b_1}.\]

What is the value of the standard error \(\text{SE}_{b_1}\)? It is in fact in the third column of the regression table in Table 10.1: 0.016. Thus

\[ \begin{aligned} b_1 \pm 1.96 \cdot \text{SE}_{b_1} &= 0.067 \pm 1.96 \cdot 0.016 = 0.067 \pm 0.031\\ &= (0.036, 0.098) \end{aligned} \]

This closely matches the \((0.035, 0.099)\) confidence interval in the last two columns of Table 10.1.

Much like hypothesis tests, however, the results of this confidence interval also are only valid if the “conditions for inference for regression” to be discussed in Section 10.3 are met.

### 10.2.5 How does R compute the table?

Since we didn’t perform the simulation to get the values of the standard error, test statistic, \(p\)-value, and endpoints of the 95% confidence interval in Table 10.1, you might be wondering how were these values computed. What did R do behind the scenes? Does R run simulations like we did using the `infer`

package in Chapters 8 and 9 on confidence intervals and hypothesis testing?

The answer is no! Much like the theory-based method for constructing confidence intervals you saw in Subsection 8.7.2 and the theory-based hypothesis test you saw in Subsection 9.6.1, there exist mathematical formulas that allow you to construct confidence intervals and conduct hypothesis tests for inference for regression. These formulas were derived in a time when computers didn’t exist, so it would’ve been impossible to run the extensive computer simulations we have in this book. We present these formulas in Subsection 10.5.1 on “theory-based inference for regression.”

In Section 10.4, we’ll go over a simulation-based approach to constructing confidence intervals and conducting hypothesis tests using the `infer`

package. In particular, we’ll convince you that the bootstrap distribution of the fitted slope \(b_1\) is indeed bell-shaped.