An Open Science Approach to Experimental Psychology Methods


Michael C. Frank

Mika Braginsky

Julie Cachia

Nicholas Coles

Tom E. Hardwicke

Robert D. Hawkins

Maya B. Mathur

Rondeline Williams


October 3, 2023


As scientists and practitioners, we often want to create generalizable, causal theories of human behavior. As it turns out, experiments – in which we use random assignment to measure a causal effect – are an unreasonably effective tool to help with this task. But how should we go about doing good experiments?

This book provides an introduction to the workflow of the experimental researcher working in psychology or the behavioral sciences more broadly. The organization of the book is sequential, from the planning stages of the research process through design, data gathering, analysis, and reporting. We introduce these concepts via narrative examples from a range of sub-disciplines, including cognitive, developmental, and social psychology. Throughout, we also illustrate the pitfalls that led to the “replication crisis” in psychology.

To help researchers avoid these pitfalls, we advocate for an open-science based approach in which transparency is integral to the entire experimental workflow. We provide readers with guidance for preregistration, project management, data sharing, and reproducible report writing.

The story of this book

Experimental Methods (Psych 251) is the foundational course for incoming graduate students in the Stanford psychology department. For the last twelve years, one of us (Frank) has taught this course and most of us (Cachia, Coles, Hardwicke, Hawkins, Mathur, Williams) have TA’ed, taken, or otherwise contributed to the course. The course goal is to orient students to the nuts and bolts of doing behavioral experiments, including how to plan and design a solid experiment and how to avoid common pitfalls regarding design, measurement, and sampling.

Almost all student coursework both before and in graduate school deals with the content of their research, including theories and results in their areas of focus. In contrast, the Experimental Methods course is sometimes the only one that deals with the process of research, from big questions about why we do experiments and what it means to make a causal inference, all the way to the tiny details of project organization, like what to name your directories and how to make sure you don’t lose your data in a computer crash.

This observation leads to our book’s title. “Experimentology” is the set of practices, findings, and approaches that enable the construction of robust, precise, and generalizable experiments.

The centerpiece of the Experimental Methods course is a replication project, reflecting a teaching model first described in Frank and Saxe (2012) and later expanded on in Hawkins et al. (2018). Each student chooses a published experiment in the literature and collects new data on a pre-registered version of the same experimental paradigm, comparing their result to the original publication. Over the course of the quarter, we walk through how to set up a replication experiment, how to pre-register confirmatory analyses, and how to write a reproducible report on the findings. The project teaches concepts like reliability and validity, which allow students to analyze choices that the original experimenters made – often choices that could have been made differently in hindsight!

At the end of the course, we reap the harvest of these projects. The project presentations are a wonderful demonstration of both how much the students can accomplish in a quarter and also how tricky it can be to reproduce (redo calculations in the original data) and replicate (recover similar results in new data) the published literature. Often our replication success rate for the course hovers just above 50%, an outcome that can be disturbing or distressing for students who assume that the published literature reports the absolute truth.

Figure 1: This book has fun stuff going on in the margins!

This book is an attempt to distill some of the lessons of the course (and students’ course projects) into a textbook. We’ll tell the story of the major shifts in psychology that have come about in the last ten years, including both the “replication crisis” (Open Science Collaboration 2015 et seq.) and the positive methodological reforms that have resulted from it. Using this story as motivation, we will highlight the importance of transparency during all aspects of the experimental process from planning to dissemination of materials, data, and code.

What this book is and isn’t about

This book is about psychology experiments. These will be typically be short studies conducted online or in a single visit to a lab, often with a convenience population. When we say “experiments” here we mean randomized experiments where some aspect of the participants’ experience is manipulated by the experimenter and then some outcome variable is measured.1

  • 1 We use bold to indicate the introduction of new technical terms.

  • The central thesis of the book is that:

    Experiments are intended to make maximally unbiased, generalizable, and precise estimates of specific causal effects.

    We’ll follow the implications of this thesis for a host of topics, including causal inference, experimental design, measurement, sampling, preregistration, data analysis, and many others.

    Because our focus is on experiments, we won’t be talking much about observational designs, survey methods, or qualitative research; these are important tools and appropriate for a whole host of questions, but they aren’t our focus here. We also won’t go into depth about the many fascinating methodological and statistical issues brought up by single-participant case studies, longitudinal research, field studies, or other methodological variants. Many of the concerns we raise are still important for these types of studies, but some of our advice won’t transfer to these less common designs.

    Even for students who are working on non-experimental research, we expect that a substantial part of the book content will still be useful, including chapters on replication (Chapter 3), ethics (Chapter 4), sampling (Chapter 10), project management (Chapter 13), and all of the material on reporting (Chapters 14, 15, 16).

    In our writing, we presuppose that readers have some background in psychology, at least at an introductory level. In addition, although we discuss some statistical topics, readers might find these sections more accessible with an undergraduate statistics course under their belt. Finally, our examples are written in the R statistical programming language, and for chapters on statistics and visualization especially (Chapters 5, 6, 7, 15, 16), some familiarity with R will be helpful for understanding the code.

    How to use this book

    The book is organized into five main sections, mirroring the timeline of an experiment: 1) Foundations, 2) Statistics, 3) Design, 4) Execution, and 5) Reporting. We hope that this organization makes it well-suited for teaching or for use as a reference book.2

  • 2 If you are an instructor who is planning to adopt the book for a course, you might be interested in our resources for instructors, including sample course schedules, in Appendix A.

  • The book is designed for a course for graduate students or advanced undergraduates, but the material is also suitable for self-study by anyone interested in experimental methods, whether in academic psychology or any other context – in our out of academia – in which behavioral experimentation is relevant. We also hope that some readers will come to particular chapters of the book because of an interest in specific topics like measurement (Chapter 8) or sampling (Chapter 10) and will be able to use those chapters as standalone references. And finally, for those interested in the “replication crisis” and subsequent reforms, Chapters 2, 3, 11, 13 will be especially interesting.

    Ultimately, we want to give you what you need to plan and execute your own study! Instead of enumerating different approaches, we try to provide a single coherent – and often quite opinionated – perspective, using marginal notes and references to give pointers to more advanced materials or alternative approaches. Throughout, we offer:

    • Case studies that illustrate the central concepts of a chapter,
    • Accident reports describing examples where poor research practices led to issues in the literature, and
    • Depth boxes providing simulations, linkages to advanced techniques, or more nuanced discussion.

    While case studies are often integral to the chapters, the other boxes can typically be skipped without issue.


    We highlight four major cross-cutting themes for the book: transparency, precision, bias reduction, and generalizability.

    • Transparency: For experiments to be reproducible, other researchers need to be able to determine exactly what you did. Thus, every stage of the research process should be guided by a primary concern for transparency. For example, preregistration creates transparency into the researcher’s evolving expectations and thought processes; releasing open materials and analysis scripts creates transparency into the details of the procedure.
    • Precision: We want researchers to start planning an experiment by thinking “what causal effect do I want to measure” and to make planning, sampling, design, and analytic choices that maximize the precision of this measurement. A downstream consequence of this mindset is that we move away from a focus on dichotomized inferences about statistical significance and towards analytic and meta-analytic models that focus on continuous effect sizes and confidence intervals (Cumming 2014).
    • Bias reduction: While precision refers to random error in a measurement, measurements also have systematic sources of error that bias them away from the true quantity. In our samples, analyses, experimental designs, and in the literature, we need to think carefully about sources of bias in the quantity being estimated.
    • Generalizability: Complex behaviors are rarely universal across all settings and populations, and any given experiment can only hope to cover a small slice of the possible conditions where a behavior of interest takes place (Yarkoni 2020). Psychologists must therefore consider the generalizability of their findings at every stage of the process, from stimulus selection and sampling procedures, to analytic methods and reporting.

    Throughout the book, we will return to the important relationships between these four concepts, and how the decisions made by the experimenter at every stage of design, data gathering, and analysis bear on the inferences that can be made about the results.

    The software toolkit for this book

    We introduce and advocate for an approach to reproducible study planning, analysis, and writing. This approach depends on an ecosystem of open-source software tools, which we introduce in the book’s appendices.3

  • 3 These appendices are available online at https://experimentology.io but not in the print version of the book, since their content is best viewed in the web format.

    • The R statistical programming language and the RStudio integrated development environment,
    • Version control using git and GitHub, allowing collaboration on text documents like code, prose, and data, storing and integrating contributions over time (Appendix B),
    • The RMarkdown and Quarto tools for creating reproducible reports that can be rendered to a variety of formats (Appendix C),
    • The tidyverse family of R packages, which extend the basic functionality of R with simple tools for data wrangling, analysis, and visualization (Appendix D), and
    • The ggplot2 plotting package, which makes it easy to create flexible data visualizations for both confirmatory and exploratory data analyses (Appendix E).

    Where appropriate, we provide code boxes that show the specific R code used to create our examples.


    Thanks for joining us for Experimentology! Whether you are casually browsing, doing readings for a course, or using the book as a reference in your own experimental work, we hope you find it useful. Throughout, we have tried to practice what we preach in terms of reproducibility, and so the full source code for the book is available at https://github.com/langcog/experimentology. We encourage you to browse, comment, and log issues or suggestions.4

  • 4 If you want to log a specific issue, please feel free to create an issue on our github page at https://github.com/langcog/experimentology/issues.

  • References

    Cumming, Geoff. 2014. “The New Statistics: Why and How.” Psychol. Sci. 25 (1): 7–29.
    Frank, Michael C, and Rebecca Saxe. 2012. “Teaching Replication.” Perspectives on Psychological Science 7: 595–99.
    Hawkins, Robert D, Eric N Smith, Carolyn Au, Juan Miguel Arias, Rhia Catapano, Eric Hermann, Martin Keil, et al. 2018. “Improving the Replicability of Psychological Science Through Pedagogy.” Advances in Methods and Practices in Psychological Science 1 (1): 7–18.
    Open Science Collaboration. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349 (6251).
    Yarkoni, Tal. 2020. “The Generalizability Crisis.” Behav. Brain Sci. 45: 1–37.