17 Conclusion

Conclusion

You’ve made it to the end of Experimentology, our (sometimes opinionated) guide to how to run good psychology experiments. In this book we’ve tried to present a unified approach to the why and how of running experiments, centered on the goal of doing experiments:

Experiments are intended to make maximally unbiased, generalizable, and precise estimates of specific causal effects.

This formulation isn’t exactly how experiments are talked about in the broader field, but we hope you’ve started to see some of the rationale behind this approach. In this final chapter, we will briefly discuss some aspects of our approach, as well how this approach connects with our four themes, transparency, measurement precision, bias reduction, and generalizability. We’ll end by mentioning some exciting new trends in the field that give us hope about the future of experimentology and psychology more broadly.

17.1 Summarizing our approach

The Experimentology approach is grounded in both an appreciation of the power of experiments to reveal important aspects about human psychology and also an understanding of the many ways that experiments can fail. In particular, the “replication crisis” (chapter 3) has revealed that small samples, a focus on dichotomous statistical inference, and a lack of transparency around data analysis can lead to a literature that is often neither reproducible nor replicable. Our approach is designed to avoid these pitfalls.

We focus on measurement precision in service of measuring causal effects. The emphasis on causal effects stems from an acknowledgement of the key role of experiments in establishing causal inferences (chapter 1) and the importance of causal relationships to theories (chapter 2). In our statistical approach, we focus on estimation (chapter 5) and modeling (chapter 7), helping us to avoid some of the fallacies that come along with dichotomous inference (chapter 6). We choose measures to maximize reliability (chapter 8). We prefer simple, within-participant experimental designs because they typically result in more precise estimates (chapter 9). And we think meta-analytically about the overall evidence for a particular effect beyond our individual experiment (chapter 16).

Further, we recognize the presence of many potential sources of bias in our estimates, leading us to focus on bias reduction. In our measurements, we identify arguments for the validity of our measures, decreasing bias in estimation of the key constructs of interest (chapter 8); in our designs we seek to minimize bias due to confounding or experimenter effects (chapter 9). We also try to minimize the possibility of bias in our decisions about data collection (chapter 12) and data analysis (chapter 11). Finally, we recognize the possibility of bias in literatures as a whole and consider ways to compensate in our estimates (chapter 16).

Finally, we consider generalizability throughout the process. We theorize with respect to a particular population (chapter 2) and select our sample in order to maximize the generalizability of our findings to that target population (chapter 10). In our statistical analysis, we take into account multiple dimensions of generalizability, including across participants and experimental stimulus items (chapter 7). And in our reporting, we contextualize our findings with respect to limits on their generalizability (chapter 14).

Woven throughout this narrative is the hope that embracing transparency throughout the experimental process will help you maximize your work. Not only is sharing your work openly an ethical responsibility (chapter 4), but it’s also a great way to minimize errors while creating valuable products that both advance scientific progress and accelerate your own career (chapter 13).

17.2 Forward the field

We have focused especially on reproducibility and replicability issues, but we’ve learned at various points in this book that there’s a replication crisis (Open Science Collaboration 2015), a theory crisis (Oberauer and Lewandowsky 2019), and a generalizability crisis (Yarkoni 2020) in psychology. Based on all these crises, you might think that we are pessimistic about the future of psychology. Not so.

There have been tremendous changes in psychological methods since we started teaching Experimental Methods in 2012. When we began, it was common for incoming graduate students to describe the rampant \(p\)-hacking they had been encouraged to do in their undergraduate labs. Now, students join the class aware of new practices like preregistration and cognizant of problems of generalizability and theory building. It takes a long time for a field to change, but we have seen tremendous progress at every level—from government policies requiring transparency in the sciences all the way down to individual researchers’ adoption of tools and practices that increase the efficiency of their work and decrease the chances of error.

One of the most exciting trends has been the rise of metascience, in which researchers use the tools of science to understand how to make science better (Hardwicke et al. 2020). Reproducibility and replicability projects (reviewed in chapter 3) can help us measure the robustness of the scientific literature. In addition, studies that evaluate the impacts of new policies (e.g., Hardwicke et al. 2018)—can help stakeholders like journal editors and funders make informed choices about how to push the field toward more robust science.

In addition to changes that correct methodological issues, the last ten years have seen the rise of “big team science” efforts that advance the field in new ways (Coles et al. 2022). Collaborations such as the Psychological Science Accelerator (Moshontz et al. 2018) and ManyBabies (Frank et al. 2017) allow hundreds of researchers from around the world to come together to run shared projects. These projects are enabled by open science practices like data and code sharing, and they provide a way for researchers to learn best practices via participating. In addition, by including broader and more diverse samples, they can help address challenges around generalizability (Klein et al. 2018).

Finally, the last ten years have seen huge progress in the use of statistical models both for understanding data (McElreath 2018) and for describing specific psychological mechanisms (Ma, Kording, and Goldreich 2023). In our own work, we have used these models extensively and we believe that they provide an exciting toolkit for building quantitative theories that allow us to explain and to predict the human mind.

17.3 Final thoughts

Doing experiments is a craft, one that requires practice and attention. The first experiment you run will have limitations and issues. So will the 100th. But as you refine your skills, the quality of the studies you design will get better. Further, your own ability to judge others’ experiments will improve as well, making you a more discerning consumer of empirical results. We hope you enjoy this journey!

Coles, Nicholas A, J Kiley Hamlin, Lauren L Sullivan, Timothy H Parker, and Drew Altschul. 2022. “Build Up Big-Team Science.” Nature Publishing Group.

Frank, Michael C, Elika Bergelson, Christina Bergmann, Alejandrina Cristia, Caroline Floccia, Judit Gervain, J Kiley Hamlin, et al. 2017. “A Collaborative Approach to Infant Research: Promoting Reproducibility, Best Practices, and Theory-Building.” Infancy 22 (4): 421–35.

Hardwicke, Tom E, Maya B Mathur, Kyle Earl MacDonald, Gustav Nilsonne, George Christopher Banks, Mallory Kidwell, Alicia Hofelich Mohr, et al. 2018. “Data Availability, Reusability, and Analytic Reproducibility: Evaluating the Impact of a Mandatory Open Data Policy at the Journal Cognition.” Royal Society Open Science 5. https://doi.org/10.1098/rsos.180448.

Hardwicke, Tom E, Stylianos Serghiou, Perrine Janiaud, Valentin Danchev, Sophia Crüwell, Steven N. Goodman, and John P. A. Ioannidis. 2020. “Calibrating the Scientific Ecosystem through Meta-Research.” Annual Review of Statistics and Its Application 7 (1): 11–37. https://doi.org/10.1146/annurev-statistics-031219-041104.

Klein, Richard A, Michelangelo Vianello, Fred Hasselman, Byron G Adams, Reginald B Adams Jr, Sinan Alper, Mark Aveyard, et al. 2018. “Many Labs 2: Investigating Variation in Replicability across Samples and Settings.” Advances in Methods and Practices in Psychological Science 1 (4): 443–90.

Ma, Wei Ji, Konrad Paul Kording, and Daniel Goldreich. 2023. Bayesian Models of Perception and Action: An Introduction. MIT Press.

McElreath, Richard. 2018. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Chapman & Hall/CRC.

Moshontz, Hannah, Lorne Campbell, Charles R Ebersole, Hans IJzerman, Heather L Urry, Patrick S Forscher, Jon E Grahe, et al. 2018. “The Psychological Science Accelerator: Advancing Psychology through a Distributed Collaborative Network.” Advances in Methods and Practices in Psychological Science 1 (4): 501–15.

Oberauer, Klaus, and Stephan Lewandowsky. 2019. “Addressing the Theory Crisis in Psychology.” Psychonomic Bulletin & Review 26 (5): 1596–1618.

Open Science Collaboration. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349 (6251).

Yarkoni, Tal. 2020. “The Generalizability Crisis.” Behavioral and Brain Sciences 45: 1–37.