How do we create generalizable theories of human behavior? Experiments provide us a tool for measuring causal effects, which provide the basis for building theories. If we design our experiments appropriately, we can even begin to estimate generalizable relationships between different psychological constructs. But how do you do an experiment?

This book provides an introduction to the workflow of the experimental researcher in the psychological sciences. The organization is sequential, from the planning stages of the research process through design, data collection, analysis, and reporting. We introduce these concepts via narrative examples from a range of sub-disciplines, including cognitive, developmental, and social psychology. Throughout, we also illustrate the pitfalls that led to the “replication crisis” in psychology. To help researchers avoid these pitfalls, we advocate for an open-science based approach, providing readers with guidance for preregistration, project management, data sharing, and reproducible writing.

The story of this book

Experimental Methods (Psych 251) is the foundational course for incoming graduate students in the Stanford psychology department. For the last ten years, one of us (Frank) has taught this course and most of us (Hawkins, Cachia, Hardwicke, Mathur, Williams) have TA’d, taken, or otherwise contributed to the course. The goal is to orient students to the nuts and bolts of doing behavioral experiments, including how to plan and design a solid experiment and how to avoid common pitfalls regarding design, measurement, and sampling.

Almost all of students’ coursework both before and in graduate school deals with the content of their research, including theories and results in their areas of focus. In contrast, the course is sometimes the only one that deals with the process of research, from big questions about why we do experiments and what it means to make a causal inference all the way to the tiny details of organization, like what to name your directories and how to make sure you don’t lose your data in a computer crash.

This observation leads to our book’s title. Experimentology is not psychology, cognitive science, or any other body of content knowledge – but rather the set of practices, findings, and approaches that help to enable the construction of robust, precise, and generalizable experiments.

The centerpiece of our course is a replication project, reflecting a teaching model first described in Frank & Saxe (2012) and later expanded on in Hawkins et al. (2018). Each student chooses a published experiment in the literature and collects new data on a pre-registered version of the same experimental paradigm, comparing their result to the original publication. Over the course of the quarter, we walk through how to set up a replication experiment, how to pre-register confirmatory analyses, and how to write a reproducible report on the findings. The project provides numerous object lessons for teaching concepts like reliability and validity, which allow students to analyze choices that the original experimenters made – often choices that could have been made differently in hindsight!

At the end of the course, we reap the harvest of projects. The project presentations are a wonderful demonstration of both how much the students can accomplish in a quarter and also how tricky it can be to reproduce (redo calculations in the original data) and replicate (recover similar results in new data) the published literature. Often our replication rate for the course hovers just above 50%, an outcome that can be disturbing or distressing for students who assume that the published literature reports the absolute truth.

This book has fun stuff going on in the margins!

This book is an attempt to distill some of the lessons of the course (and the last ten years of course projects) into a textbook. We’ll tell the story of the major shifts in psychology that have come about in the last ten years, including both the “replication crisis” (Open Science Collaboration, 2015 et seq.) and the positive methodological reforms that have resulted from it. Using this story as motivation, we will highlight the importance of transparency during all aspects of the experimental process from planning to dissemination of materials, data, and code.

What this book is and isn’t about

This book is about psychology experiments. These will be typically be short studies conducted online or in a single visit to a lab, often with a convenience population. When we say “experiments” here we mean randomized experiments where some aspect of the participants’ experience is manipulated by the experimenter and then some outcome variable is measured.

The central thesis of the book is that:

Experiments are intended to make maximally unbiased, generalizable, and precise estimates of specific causal effects.

We’ll follow the implications of this thesis for a host of topics, including causal inference, experimental design, measurement, sampling, preregistration, data analysis, and many others.

Because our focus is on experiments, we won’t be talking much about observational designs, survey methods, or qualitative research; these are important tools and appropriate for a whole host of questions, but they aren’t our topic here. We also won’t go into depth about the many fascinating methodological and statistical issues brought up by single-participant case studies, longitudinal research, field studies, or other methodological variants. Many of the concerns we raise are still important for these types of studies, but some of our advice won’t transfer to these critical but more unusual cases.

In our writing, we presuppose that readers have some background in psychology, at least at an introductory level. In addition, although we give a treatment of some statistical topics, readers might enjoy statistical chapters more with an undergraduate statistics course under their belt. Finally, our examples are written in the R statistical programming language, and for chapters on statistics and visualization especially (Chapters 57, 15, and 16), some familiarity with R will be helpful for understanding the code.

How to use this book

The book is organized into five main sections, mirroring the timeline of an experiment: 1) Preliminaries, 2) Statistics, 3) Design, 4) Execution, and 5) Reporting. We hope that this organization makes it well-suited for teaching or for use as a reference book for self-study, and we provide a number of resources for instructors in Appendix E.

The intended audience for the book is graduate students, advanced undergraduates, or self-learners with some domain knowledge in psychology. We also hope that some readers will come to particular chapters of the book because of an interest in specific topics like measurement (Chapter 8) or sampling (Chapter 10) and will be able to use those chapters as standalone references. And finally, for those interested in the “replication crisis” and reforms that have taken place in the behavioral sciences in the wake of it, Chapters 2, 3, 11 and 13 will be especially interesting.

We want to give you what you need to plan and execute your own study! Instead of enumerating different approaches, we try to provide a single coherent – and often quite opinionated – perspective, using marginal notes and references to give pointers to more advanced materials or alternative approaches. Throughout, we offer:

  • Case studies that illustrate the central concepts of a chapter,
  • Accident reports describing examples where poor research practices led to issues in the literature,
  • Ethics boxes linking chapter materials to specific ethical issues, and
  • Depth boxes providing simulations, linkages to advanced techniques, or more nuanced discussion.

While case studies are often integral to the chapters, the other boxes can typically be skipped without issue.

Themes

We highlight four major cross-cutting themes for the book: transparency, precision, bias reduction, and generalizability.

  • Transparency: For experiments to be reproducible, other researchers need to be able to determine exactly what you did. Thus, every stage of the research process should be guided by a primary concern for transparency. For example, preregistration creates transparency into the researcher’s evolving expectations and thought processes; releasing open materials and analysis scripts creates transparency into the details of the procedure.
  • Precision: We want researchers to start planning an experiment by thinking “what causal effect do I want to measure” and to make their planning, sampling, design, and analytic choices to maximize the precision of this measurement. A downstream consequence of this mindset is that we move away from a focus on dichotomized inferences about statistical significance and towards analytic and meta-analytic models that focus on continuous effect sizes and confidence intervals (Cumming, 2014).
  • Bias reduction: Precision is not enough if the estimate is biased. In our samples, analyses, experimental designs, and in the literature, we need to think carefully about sources of bias in the quantity being estimated.
  • Generalizability: Complex behaviors are rarely universal across all settings and populations, and any given experiment can only hope to cover a small slice of the possible conditions where a behavior of interest takes place (Yarkoni, 2020). Psychologists must therefore consider the generalizability of their findings at every stage of the process, from stimulus selection and sampling procedures, to analytic methods and reporting.

Throughout the book, we will return to the important relationships between these four concepts, and how the decisions made by the experimenter at every stage of design, data collection, and analysis bear on the inferences that can be made about the results.

The software toolkit of the behavioral researcher (and of this book)

We introduce and advocate for an approach to reproducible study planning, analysis, and writing. This approach depends on an ecosystem of open-source software tools, which we introduce in the book’s appendices.

  • The R statistical programming language and the R Studio integrated development environment,
  • Version control using git and GitHub, allowing collaboration on text documents like code, prose, and data, storing and integrating contributions over time (Appendix A),
  • The RMarkdown format for creating reproducible reports that can be rendered to a variety of formats (Appendix B),
  • The tidyverse family of R packages, which extend the basic functionality of R with simple tools for data wrangling, analysis, and visualization (Appendix C), and
  • The ggplot2 plotting package, which makes it easy to create flexible data visualizations for both confirmatory and exploratory data analyses (Appendix D).

Where appropriate, we provide code boxes that show the specific R code to be used to recreate our examples.

Onward!

Thanks for joining us for Experimentology! Whether you are casually browsing, doing readings for a course, or using the book as a reference in your own experimental work, we hope you find it useful. Throughout, we have tried to practice what we preach in terms of reproducibility, and so the full source code for the book is available at http://github.com/langcog/experimentology; we encourage you to browse, comment, and log issues or suggestions.

References

Cumming, G. (2014). The new statistics: Why and how. Psychol. Sci., 25(1), 7–29.
Frank, M. C., & Saxe, R. (2012). Teaching replication. Perspectives on Psychological Science, 7, 595–599.
Hawkins, R. D., Smith, E. N., Au, C., Arias, J. M., Catapano, R., Hermann, E., Keil, M., Lampinen, A., Raposo, S., Reynolds, J., Salehi, S., Salloum, J., Tan, J., & Frank, M. C. (2018). Improving the replicability of psychological science through pedagogy. Advances in Methods and Practices in Psychological Science, 1(1), 7–18.
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251).
Yarkoni, T. (2020). The generalizability crisis. Behav. Brain Sci., 1–37.