## D.9 Chapter 9 Solutions

**(LC9.1)** Conduct the same hypothesis test and confidence interval analysis comparing male and female promotion rates using the median rating instead of the mean rating. What was different and what was the same?

**Solution**:

**(LC9.2)** Why are we relatively confident that the distributions of the sample proportions will be good approximations of the population distributions of promotion proportions for the two genders?

**Solution**:

Because the sample is representative of the population.

**(LC9.3)** Using the definition of *p-value*, write in words what the \(p\)-value represents for the hypothesis test comparing the promotion rates for males and females.

**Solution**:

The \(p\)-value represents for the likelihood that the true mean for the promotion rates for males and females in the population is the same.

**(LC9.4)** Describe in a paragraph how we used Allen Downey’s diagram to conclude if a statistical difference existed between the promotion rate of males and females using this study.

**Solution**:

We use the `promotions`

dataset as the input for test statistic. The \(H_0\) model is “there is no difference between promotion rates of males and females”, and with the p-value from `infer`

commands, we reject the \(H_0\) model and conclude that there is a statistical difference existed between the promotion rate of males and females.

**(LC9.5)** What is wrong about saying, “The defendant is innocent.” based on the US system of criminal trials?

**Solution**:

Failing to prove the defendant is guilty does not equal to proving that the defendant is innocent. There will always be the possibility of making errors in the trial.

**(LC9.6)** What is the purpose of hypothesis testing?

**Solution**:

The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief, or hypothesis, about a parameter. (source: https://personal.utdallas.edu/~scniu/OPRE-6301/documents/Hypothesis_Testing.pdf)

**(LC9.7)** What are some flaws with hypothesis testing? How could we alleviate them?

**Solution**:

The p-value’s 0.05 threshold can be misleading researchers to conduct multiple bootstrap tests to get a smaller p-value, therefore validating their statistical results. This threshold is relatively arbitrary (if a p-value is 0.051, does it mean there is no statistical significance?), and trusting it too much may lead to imprecise conclusions. To alleviate this problem, keep in mind that having a smaller p-value can be the result of a “lucky” sampling that is not truly representative, and do multiple bootstrap samplings for hypothesis testing before concluding.

**(LC9.8)** Consider two \(\alpha\) significance levels of 0.1 and 0.01. Of the two, which would lead to a more *liberal* hypothesis testing procedure? In other words, one that will, all things being equal, lead to more rejections of the null hypothesis \(H_0\).

**Solution**:

The smaller \(\alpha\) of 0.01 will lead to a more liberal hypothesis testing procedure, because the required p-value for reject the null hypothesis \(H_0\) is smaller.

**(LC9.9)** Conduct the same analysis comparing action movies versus romantic movies using the median rating instead of the mean rating. What was different and what was the same?

**Solution**:

**(LC9.10)** What conclusions can you make from viewing the faceted histogram looking at `rating`

versus `genre`

that you couldn’t see when looking at the boxplot?

**Solution**:

From the faceted histogram, we can also see the comparison of rating`versus`

genre` over each year, but we cannot conclude them from the boxplot.

**(LC9.11)** Describe in a paragraph how we used Allen Downey’s diagram to conclude if a statistical difference existed between mean movie ratings for action and romance movies.

**Solution**:

We use the `movies_sample`

dataset as the input for test statistic. The \(H_0\) model is “there is no statistical difference existed between mean movie ratings for action and romance movies”, and with the p-value from `infer`

commands, we reject the \(H_0\) model and conclude that there is a statistical difference existed between mean movie ratings for action and romance movies.

**(LC9.12)** Why are we relatively confident that the distributions of the sample ratings will be good approximations of the population distributions of ratings for the two genres?

**Solution**:

Because the sample is representative of the population.

**(LC9.13)** Using the definition of \(p\)-value, write in words what the \(p\)-value represents for the hypothesis test comparing the mean rating of romance to action movies.

**Solution**:

The \(p\)-value represent the probability that the difference between mean movie ratings for action and romance movies in the sample is natural, i.e., the probability that there is no statistical difference between mean movie ratings for action and romance movies in the population.

**(LC9.14)** What is the value of the \(p\)-value for the hypothesis test comparing the mean rating of romance to action movies?

**Solution**:

The \(p\)-value here is \(0.004\).

**(LC9.15)** Test your data wrangling knowledge and EDA skills:

- Use
`dplyr`

and`tidyr`

to create the necessary data frame focused on only action and romance movies (but not both) from the`movies`

data frame in the`ggplot2movies`

package. - Make a boxplot and a faceted histogram of this population data comparing ratings of action and romance movies from IMDb.
- Discuss how these plots compare to the similar plots produced for the
`movies_sample`

data.

**Solution**:

- Use
`dplyr`

and`tidyr`

to create the necessary data frame focused on only action and romance movies (but not both) from the`movies`

data frame in the`ggplot2movies`

package.

- Make a boxplot and a faceted histogram of this population data comparing ratings of action and romance movies from IMDb. # need a tidy dataset with genre