Introduction for instructors


Here are some resources to help you use ModernDive:

  1. We’ve included review questions posed as Learning checks. You can find all the solutions to all Learning checks in Appendix D of the online version of the book at
  2. Dr. Jenny Smetzer and Albert Y. Kim have written a series of labs and problem sets. You can find them at
  3. You can see the webpages for two courses that use ModernDive:

Why did we write this book?

This book is inspired by

  • Mathematical Statistics with Resampling and R (Chihara and Hesterberg 2011)
  • OpenIntro: Intro Stat with Randomization and Simulation (Diez, Barr, and Çetinkaya-Rundel 2014)
  • R for Data Science (Grolemund and Wickham 2017)

The first book, designed for upper-level undergraduates and graduate students, provides an excellent resource on how to use resampling to impart statistical concepts like sampling distributions using computation instead of large-sample approximations and other mathematical formulas. The last two books are free options for learning about introductory statistics and data science, providing an alternative to the many traditionally expensive introductory statistics textbooks.

When looking over the introductory statistics textbooks that currently exist, we found there wasn’t one that incorporated many newly developed R packages directly into the text, in particular the many packages included in the tidyverse set of packages, such as ggplot2, dplyr, tidyr, and readr that will be the focus of this book’s first part on “Data Science with tidyverse.”

Additionally, there wasn’t an open-source and easily reproducible textbook available that exposed new learners to all four of the learning goals we listed in the “Introduction for students” subsection. We wanted to write a book that could develop theory via computational techniques and help novices master the R language in doing so.

Who is this book for?

This book is intended for instructors of traditional introductory statistics classes using RStudio, who would like to inject more data science topics into their syllabus. RStudio can be used in either the server version or the desktop version. (This is discussed further in Subsection 1.1.1.) We assume that students taking the class will have no prior algebra, no calculus, nor programming/coding experience.

Here are some principles and beliefs we kept in mind while writing this text. If you agree with them, this is the book for you.

  1. Blur the lines between lecture and lab
    • With increased availability and accessibility of laptops and open-source non-proprietary statistical software, the strict dichotomy between lab and lecture can be loosened.
    • It’s much harder for students to understand the importance of using software if they only use it once a week or less. They forget the syntax in much the same way someone learning a foreign language forgets the grammar rules. Frequent reinforcement is key.
  2. Focus on the entire data/science research pipeline
  3. It’s all about the data
    • We leverage R packages for rich, real, and realistic datasets that at the same time are easy-to-load into R, such as the nycflights13 and fivethirtyeight packages.
    • We believe that data visualization is a “gateway drug” for statistics and that the grammar of graphics as implemented in the ggplot2 package is the best way to impart such lessons. However, we often hear: “You can’t teach ggplot2 for data visualization in intro stats!” We, like David Robinson, are much more optimistic and have found our students have been largely successful in learning it.
    • dplyr has made data wrangling much more accessible to novices, and hence much more interesting datasets can be explored.
  4. Use simulation/resampling to introduce statistical inference, not probability/mathematical formulas
    • Instead of using formulas, large-sample approximations, and probability tables, we teach statistical concepts using simulation-based inference.
    • This allows for a de-emphasis of traditional probability topics, freeing up room in the syllabus for other topics. Bridges to these mathematical concepts are given as well to help with relation of these traditional topics with more modern approaches.
  5. Don’t fence off students from the computation pool, throw them in!
    • Computing skills are essential to working with data in the 21st century. Given this fact, we feel that to shield students from computing is to ultimately do them a disservice.
    • We are not teaching a course on coding/programming per se, but rather just enough of the computational and algorithmic thinking necessary for data analysis.
  6. Complete reproducibility and customizability
    • We are frustrated when textbooks give examples, but not the source code and the data itself. We give you the source code for all examples as well as the whole book! While we have made choices to occasionally hide the code that produces more complicated figures, reviewing the book’s GitHub repository will provide you with all the code (see below).
    • Ultimately the best textbook is one you’ve written yourself. You know best your audience, their background, and their priorities. You know best your own style and the types of examples and problems you like best. Customization is the ultimate end. We encourage you to take what we’ve provided and make it work for your own needs. For more about how to make this book your own, see “About this book” later in this Preface.


Chihara, Laura M., and Tim C. Hesterberg. 2011. Mathematical Statistics with Resampling and R. First. Hoboken, NJ: John Wiley & Sons.

Diez, David M, Christopher D Barr, and Mine Çetinkaya-Rundel. 2014. Introductory Statistics with Randomization and Simulation. First. Scotts Valley, CA: CreateSpace Independent Publishing Platform.

Grolemund, Garrett, and Hadley Wickham. 2017. R for Data Science. First. Sebastopol, CA: O’Reilly Media.