How do we create generalizable theories of human behavior? Experiments provide us a tool for measuring causal effects, which provide the basis for building theories. If we design our experiments appropriately, we can even begin to estimate generalizable relationships between different psychological constructs. But how do you do an experiment?
This book provides an introduction to the workflow of the experimental researcher in the psychological sciences. The organization is sequential, from the planning stages of the research process through design, data collection, analysis, and reporting. We introduce these concepts via narrative examples from a range of sub-disciplines, including cognitive, developmental, and social psychology. Throughout, we also illustrate the pitfalls that led to the “replication crisis” in psychology. To help researchers avoid these pitfalls, we advocate for an open-science based approach, providing readers with guidance for preregistration, project management, data sharing, and reproducible writing.
Experimental Methods (Psych 251) is the foundational course for incoming graduate students in the Stanford psychology department. For the last ten years, one of us (Frank) has taught this course and most of us (Hawkins, Cachia, Hardwicke, Mathur, Williams) have TA’d, taken, or otherwise contributed to the course. The goal is to orient students to the nuts and bolts of doing behavioral experiments, including how to plan and design a solid experiment and how to avoid common pitfalls regarding design, measurement, and sampling.
Almost all of students’ coursework both before and in graduate school deals with the content of their research, including theories and results in their areas of focus. In contrast, the course is sometimes the only one that deals with the process of research, from big questions about why we do experiments and what it means to make a causal inference all the way to the tiny details of organization, like what to name your directories and how to make sure you don’t lose your data in a computer crash.
This observation leads to our book’s title. Experimentology is not psychology, cognitive science, or any other body of content knowledge – but rather the set of practices, findings, and approaches that help to enable the construction of robust, precise, and generalizable experiments.
The centerpiece of our course is a replication project, reflecting a teaching model first described in Frank & Saxe (2012) and later expanded on in Hawkins et al. (2018). Each student chooses a published experiment in the literature and collects new data on a pre-registered version of the same experimental paradigm, comparing their result to the original publication. Over the course of the quarter, we walk through how to set up a replication experiment, how to pre-register confirmatory analyses, and how to write a reproducible report on the findings. The project provides numerous object lessons for teaching concepts like reliability and validity, which allow students to analyze choices that the original experimenters made – often choices that could have been made differently in hindsight!
At the end of the course, we reap the harvest of projects. The project presentations are a wonderful demonstration of both how much the students can accomplish in a quarter and also how tricky it can be to reproduce (redo calculations in the original data) and replicate (recover similar results in new data) the published literature. Often our replication rate for the course hovers just above 50%, an outcome that can be disturbing or distressing for students who assume that the published literature reports the absolute truth.
This book has fun stuff going on in the margins!
This book is an attempt to distill some of the lessons of the course (and the last ten years of course projects) into a textbook. We’ll tell the story of the major shifts in psychology that have come about in the last ten years, including both the “replication crisis” (Open Science Collaboration, 2015 et seq.) and the positive methodological reforms that have resulted from it. Using this story as motivation, we will highlight the importance of transparency during all aspects of the experimental process from planning to dissemination of materials, data, and code.
This book is about psychology experiments. These will be typically be short studies conducted online or in a single visit to a lab, often with a convenience population. When we say “experiments” here we mean randomized experiments where some aspect of the participants’ experience is manipulated by the experimenter and then some outcome variable is measured.
The central thesis of the book is that:
Experiments are intended to make maximally unbiased, generalizable, and precise estimates of specific causal effects.
We’ll follow the implications of this thesis for a host of topics, including causal inference, experimental design, measurement, sampling, preregistration, data analysis, and many others.
Because our focus is on experiments, we won’t be talking much about observational designs, survey methods, or qualitative research; these are important tools and appropriate for a whole host of questions, but they aren’t our topic here. We also won’t go into depth about the many fascinating methodological and statistical issues brought up by single-participant case studies, longitudinal research, field studies, or other methodological variants. Many of the concerns we raise are still important for these types of studies, but some of our advice won’t transfer to these critical but more unusual cases.
In our writing, we presuppose that readers have some background in psychology, at least at an introductory level. In addition, although we give a treatment of some statistical topics, readers might enjoy statistical chapters more with an undergraduate statistics course under their belt. Finally, our examples are written in the R statistical programming language, and for chapters on statistics and visualization especially (Chapters 5 – 7, 15, and 16), some familiarity with R will be helpful for understanding the code.
The book is organized into five main sections, mirroring the timeline of an experiment: 1) Preliminaries, 2) Statistics, 3) Design, 4) Execution, and 5) Reporting. We hope that this organization makes it well-suited for teaching or for use as a reference book for self-study, and we provide a number of resources for instructors in Appendix E.
The intended audience for the book is graduate students, advanced undergraduates, or self-learners with some domain knowledge in psychology. We also hope that some readers will come to particular chapters of the book because of an interest in specific topics like measurement (Chapter 8) or sampling (Chapter 10) and will be able to use those chapters as standalone references. And finally, for those interested in the “replication crisis” and reforms that have taken place in the behavioral sciences in the wake of it, Chapters 2, 3, 11 and 13 will be especially interesting.
We want to give you what you need to plan and execute your own study! Instead of enumerating different approaches, we try to provide a single coherent – and often quite opinionated – perspective, using marginal notes and references to give pointers to more advanced materials or alternative approaches. Throughout, we offer:
While case studies are often integral to the chapters, the other boxes can typically be skipped without issue.
We highlight four major cross-cutting themes for the book: transparency, precision, bias reduction, and generalizability.
Throughout the book, we will return to the important relationships between these four concepts, and how the decisions made by the experimenter at every stage of design, data collection, and analysis bear on the inferences that can be made about the results.
We introduce and advocate for an approach to reproducible study planning, analysis, and writing. This approach depends on an ecosystem of open-source software tools, which we introduce in the book’s appendices.
git
and GitHub, allowing collaboration on text documents like code, prose, and data, storing and integrating contributions over time (Appendix A),RMarkdown
format for creating reproducible reports that can be rendered to a variety of formats (Appendix B),tidyverse
family of R packages, which extend the basic functionality of R with simple tools for data wrangling, analysis, and visualization (Appendix C), andggplot2
plotting package, which makes it easy to create flexible data visualizations for both confirmatory and exploratory data analyses (Appendix D).Where appropriate, we provide code boxes that show the specific R
code to be used to recreate our examples.
Thanks for joining us for Experimentology! Whether you are casually browsing, doing readings for a course, or using the book as a reference in your own experimental work, we hope you find it useful. Throughout, we have tried to practice what we preach in terms of reproducibility, and so the full source code for the book is available at http://github.com/langcog/experimentology; we encourage you to browse, comment, and log issues or suggestions.