Experimentology
An Open Science Approach to Experimental Psychology Methods
Preface
As scientists and practitioners, we often want to create generalizable, causal theories of human behavior. As it turns out, experiments—in which we use random assignment to measure a causal effect—are an unreasonably effective tool to help with this task. But how should we go about doing good experiments?
This book provides an introduction to the workflow of the experimental researcher working in psychology or the behavioral sciences more broadly. The organization of the book is sequential, from the planning stages of the research process through design, data gathering, analysis, and reporting. We introduce these concepts via narrative examples from a range of subdisciplines, including cognitive, developmental, and social psychology. Throughout, we also illustrate the pitfalls that led to the “replication crisis” in psychology.
To help researchers avoid these pitfalls, we advocate for an open science-based approach in which transparency is integral to the entire experimental workflow. We provide readers with guidance on preregistration, project management, data sharing, and reproducible report writing.
The story of this book
Experimental Methods (Psych 251) is the foundational course for incoming graduate students in the Stanford psychology department. The course goal is to orient students to the nuts and bolts of doing behavioral experiments, including how to plan and design a solid experiment and how to avoid common pitfalls regarding design, measurement, and sampling.
Almost all student coursework both before and in graduate school deals with the content of their research, including theories and results in their areas of focus. In contrast, our course is sometimes the only one that deals with the process of research, from big questions about why we do experiments and what it means to make a causal inference all the way to the tiny details of project organization, such as what to name your directories and how to make sure you don’t lose your data in a computer crash.
This observation leads to our book’s title. “Experimentology” is the set of practices, findings, and approaches that enable the construction of robust, precise, and generalizable experiments.
The centerpiece of the Experimental Methods course is a replication project, reflecting a teaching model first described in Frank and Saxe (2012) and later expanded on in Hawkins et al. (2018) . Each student chooses a published experiment in the literature and collects new data on a preregistered version of the same experimental paradigm, comparing their result to the original publication. Over the course of the quarter, we walk through how to set up a replication experiment, how to preregister confirmatory analyses, and how to write a reproducible report on the findings. The project teaches concepts like reliability and validity, which allow students to analyze choices that the original experimenters made—often choices that could have been made differently in hindsight!
At the end of the course, we reap the harvest of these projects. The project presentations are a wonderful demonstration of both how much the students can accomplish in a quarter and how tricky it can be to reproduce (redo calculations in the original data) and replicate (recover similar results in new data) the published literature. Often our replication success rate for the course hovers just above 50%, an outcome that can be disturbing or distressing for students who assume that the published literature reports the absolute truth.
This book is an attempt to distill some of the lessons of the course (and students’ course projects) into a textbook. We’ll tell the story of the major shifts in psychology that have come about in the last ten years, including both the “replication crisis” and the positive methodological reforms that have resulted from it. Using this story as motivation, we will highlight the importance of transparency during all aspects of the experimental process from planning to dissemination of materials, data, and code.
What this book is and isn’t about
This book is about psychology experiments. These will typically be short studies conducted online or in a single visit to a lab, often—though certainly not always—with a convenience sample. When we say “experiments” here, we mean randomized experiments where some aspect of the participants’ experience is manipulated by the experimenter and then some outcome variable is measured.1
1 We use bold to indicate the introduction of new technical terms.
The central thesis of the book is that:
Experiments are intended to make maximally unbiased, generalizable, and precise estimates of specific causal effects.
We’ll explore the implications of this thesis for a host of topics, including causal inference, experimental design, measurement, sampling, preregistration, data analysis, and many others.
Because our focus is on experiments, we won’t be talking much about observational designs, survey methods, or qualitative research; these are important tools and are appropriate for a whole host of questions, but they aren’t our focus here. We also won’t go into depth about the many fascinating methodological and statistical issues brought up by single-participant case studies, longitudinal research, field studies, or other methodological variants. Many of the concerns we raise are still important for these types of studies, but some of our advice won’t transfer to these less common designs.
Even for students who are working on nonexperimental research, we expect that a substantial part of the book content will still be useful, including chapters on replication (3 Replication), ethics (4 Ethics), statistics (chapters 5 Estimation, 6 Inference, 7 Models), sampling (10 Sampling), project management (13 Project management), and reporting (chapters 14 Writing, 15 Visualization, 16 Meta-analysis).
In our writing, we presuppose that readers have some background in psychology, at least at an introductory level. In addition, although we introduce a number of statistical topics, readers might find these sections more accessible with an undergraduate statistics course under their belt. Finally, our examples are written in the R statistical programming language, and for chapters on statistics and visualization especially (chapters 5 Estimation–7 Models, 15 Visualization–16 Meta-analysis), familiarity with R will be helpful for understanding the code. None of these prerequisites are necessary to read the book, but we offer them so that readers can calibrate their expectations.
How to use this book
The book is organized into five main parts, mirroring the timeline of an experiment: (1) “Foundations,” (2) “Statistics,” (3) “Planning,” (4) “Execution,” and (5) “Reporting.” We hope that this organization makes it well suited for teaching or for use as a reference book.2
2 If you are an instructor who is planning to adopt the book for a course, you might be interested in our resources for instructors, including sample course schedules, in appendix A — Instructor’s guide.
The book is designed for a course for graduate students or advanced undergraduates, but the material is also suitable for self-study by anyone interested in experimental methods, whether in academic psychology or any other context—in or out of academia—in which behavioral experimentation is relevant. We also hope that some readers will come to particular chapters of the book because of an interest in specific topics like measurement (8 Measurement) or sampling (10 Sampling) and will be able to use those chapters as stand-alone references. And finally, for those interested in the “replication crisis” and subsequent reforms, chapters 3 Replication, 11 Preregistration, and 13 Project management will be especially interesting.
Ultimately, we want to give you what you need to plan and execute your own study! Instead of enumerating different approaches, we try to provide a single coherent—and often quite opinionated—perspective, using marginal notes and references to give pointers to more advanced materials or alternative approaches. Throughout, we offer:
- case study boxes that illustrate the central concepts of a chapter
- accident report boxes describing examples where poor research practices led to issues in the literature
- depth boxes providing simulations, linkages to advanced techniques, or more nuanced discussion
While case studies are often integral to the chapters, the other boxes can typically be skipped without issue.
Themes
We highlight four major cross-cutting themes: transparency, measurement precision, bias reduction, and generalizability.3
3 Themes are noted using small caps.
- transparency: For experiments to be reproducible, other researchers need to be able to determine exactly what you did. Thus, every stage of the research process should be guided by a primary concern for transparency. For example, preregistration creates transparency into the researcher’s evolving expectations and thought processes; releasing open materials and analysis scripts creates transparency into the details of the procedure.
- measurement precision: We want researchers to start planning an experiment by thinking: “What causal effect do I want to measure?” and to make planning, sampling, design, and analytic choices that maximize the precision of this measurement. A downstream consequence of this mindset is that we move away from a focus on dichotomized inferences about statistical significance and toward analytic and meta-analytic models that focus on continuous effect sizes and confidence intervals.
- bias reduction: While precision refers to random error in a measurement, measurements also have systematic sources of error that bias them away from the true quantity. In our samples, analyses, and experimental designs, and in the literature, we need to think carefully about sources of bias in the quantity being estimated.
- generalizability: Complex behaviors are rarely universal across all settings and populations, and any given experiment can only hope to cover a small slice of the possible conditions where a behavior of interest takes place. Psychologists must therefore consider the generalizability of their findings at every stage of the process, from stimulus selection and sampling procedures to analytic methods and reporting.
Throughout the book, we will return to these four themes again and again as we discuss how the decisions made by the experimenter at every stage of design, data gathering, and analysis bear on the inferences that can be made about the results. The introduction of each chapter highlights connections to specific themes.
The software toolkit for this book
We introduce and advocate for an approach to reproducible study planning, analysis, and writing. This approach depends on an ecosystem of open-source software tools, which we introduce in the appendices:4
4 These appendices are available online at https://experimentology.io but not in the print version of the book, since their content is best viewed in the web format.
- the R statistical programming language and the RStudio integrated development environment
- version control using Git and GitHub for allowing collaboration on text documents like code, prose, and data, storing and integrating contributions over time (appendix B — Git and GitHub)
- the RMarkdown and Quarto publishing systems for creating reproducible reports that can be rendered to a variety of formats (appendix C — R Markdown and Quarto)
- the
tidyverse
family of R packages, which extend the functionality of base R with elegant tools for data wrangling, analysis, and visualization (appendix D — Tidyverse) - the
ggplot2
plotting package, which makes it easy to create flexible data visualizations for both confirmatory and exploratory data analyses (appendix E — ggplot)
Where appropriate, we provide code boxes that show the specific R code used to create our examples.
Onward!
Thanks for joining us for Experimentology! Whether you are casually browsing, doing readings for a course, or using the book as a reference in your own experimental work, we hope you find it useful. Throughout, we have tried to practice what we preach in terms of reproducibility, and so the full source code for the book is available at https://github.com/langcog/experimentology. We encourage you to browse, comment, and log issues or suggestions.5
5 The best way to give us specific feedback is to create an issue on our github page at https://github.com/langcog/experimentology/issues.
Citing this book
@book{experimentology2025,
author = {Frank, Michael C. and Braginsky, Mika and Cachia, Julie and Coles, Nicholas A. and Hardwicke, Tom E. and Hawkins, Robert D. and Mathur, Maya B. and Williams, Rondeline},
title = {{Experimentology: An Open Science Approach to Experimental Psychology Methods}},
year = {2025},
publisher = {Stanford University},
doi = {10.25936/3JP6-5M50},
note = {Also published by MIT Press, ISBN 978-0-262-55256-1}
}
For attribution, please cite this work as:
Acknowledgments
Thanks first and foremost to the many generations of students and TAs in Stanford Psych 251, who have collectively influenced the content of this book through their questions and interests.
Thanks to the staff at the MIT Press, especially Philip Laughlin and Amy Brand, for embracing a vision of a completely open web textbook that is also reviewed and published through a traditional press. We appreciate your support and flexibility.
We adapt the Contributor Roles (CRediT) Taxonomy6 to describe our contributions to this manuscript, and we recommend that you do so in your work as well.
6 Learn more at https://credit.niso.org.
- Preface
- Primary writer: Michael C. Frank
- Editor: Tom E. Hardwicke
- Chapter 1 Experiments
- Primary writer: Michael C. Frank
- Cowriter: Nicholas Coles
- Editor: Tom E. Hardwicke
- Chapter 2 Theories
- Primary writer: Michael C. Frank
- Editors: Nicholas Coles and Tom E. Hardwicke
- Chapter 3 Replication
- Primary writer: Michael C. Frank
- Editors: Maya B. Mathur, Tom E. Hardwicke, and Nicholas Coles
- Chapter 4 Ethics
- Primary writer: Rondeline Williams
- Cowriter: Michael C. Frank
- Editors: Tom E. Hardwicke and Julie Cachia
- Chapter 5 Estimation
- Cowriters: Maya B. Mathur, Nicholas Coles, and Michael C. Frank
- Editors: Julie Cachia and Tom E. Hardwicke
- Chapter 6 Inference
- Primary writer: Michael C. Frank
- Cowriter: Maya B. Mathur
- Editors: Julie Cachia and Tom E. Hardwicke
- Chapter 7 Models
- Cowriters: Maya B. Mathur and Michael C. Frank
- Editors: Tom E. Hardwicke and Robert D. Hawkins
- Chapter 8 Measurement
- Primary writer: Michael C. Frank
- Editors: Robert D. Hawkins, Tom E. Hardwicke, and Rondeline Williams
- Chapter 9 Design
- Primary writer: Michael C. Frank
- Editors: Nicholas Coles and Tom E. Hardwicke
- Chapter 10 Sampling
- Primary writer: Michael C. Frank
- Editors: Julie Cachia, Tom E. Hardwicke, and Maya B. Mathur
- Chapter 11 Preregistration
- Primary writer: Tom E. Hardwicke
- Editor: Michael C. Frank
- Chapter 12 Data collection
- Cowriters: Rondeline Williams and Michael C. Frank
- Editor: Tom E. Hardwicke
- Chapter 13 Project management
- Primary writer: Michael C. Frank
- Editor: Tom E. Hardwicke
- Chapter 14 Writing
- Primary writer: Tom E. Hardwicke
- Editor: Michael C. Frank
- Chapter 15 Visualization
- Primary writer: Robert D. Hawkins
- Editors: Michael C. Frank, Tom E. Hardwicke, and Mika Braginsky
- Chapter 16 Meta-analysis
- Cowriters: Nicholas Coles and Maya B. Mathur
- Editors: Michael C. Frank and Tom E. Hardwicke
- Conclusion
- Primary writer: Michael C. Frank
- Editor: Tom E. Hardwicke
- Appendix appendix A — Instructor’s guide
- Primary writer: Julie Cachia
- Editor: Michael C. Frank
- Appendix appendix B — Git and GitHub
- Primary writer: Julie Cachia
- Editor: Michael C. Frank
- Appendix appendix C — R Markdown and Quarto
- Primary writer: Michael C. Frank
- Editor: Julie Cachia
- Appendix appendix D — Tidyverse
- Primary writer: Michael C. Frank
- Editors: Julie Cachia and Mika Braginsky
- Appendix appendix E — ggplot
- Primary writer: Michael C. Frank
- Editors: Julie Cachia and Mika Braginsky
- Technical infrastructure
- Developers: Mika Braginsky and Natalie Braginsky