Chapter 11 Preregistration

  • Recognize the dangers of researcher degrees of freedom
  • Understand the differences between exploratory and confirmatory modes of research
  • Articulate how preregistration can reduce risk of bias and increase transparency

When not planned beforehand, data analysis can approximate a projective technique, such as the Rorschach, because the investigator can project on the data his own expectancies, desires, or biases and can pull out of the data almost any “finding” he may desire.

— Theodore X. Barber (1976)

The first principle is that you must not fool yourself–and you are the easiest person to fool… After you’ve not fooled yourself, it’s easy not to fool other scientists. You just have to be honest in a conventional way after that.

— Richard Feynman (1974)

Although there are plenty of incorrect ways to design and analyse experiments, there is no single correct way to design and analyze an experiment. In fact, for most research decisions, there are a multitude of justifiable options. For example, will you stop data collection after 20, 200, or 2000 participants? Will you remove outlier values and how will you define them? Will you conduct subgroup analyses to see whether the results are affected by sex, or age, or some other factor?

Consider a simplified, hypothetical case where you need to make five analysis decisions and have five justifiable options for each decision — this alone would result in 3125 (\(5^5\)) unique ways to analyze your data! Imagine you read a paper reporting the result of just one of these ways of analyzing the data, without telling you that any of the other choices that could have been taken would have yielded a very different result. This undisclosed flexibility would dramatically decrease your confidence.

In this chapter, we will find out why undisclosed flexibility in the design, analysis, reporting, and interpretation of experiments can lead to scientists fooling themselves and fooling each other. Collectively, these research decisions are known as “researcher degrees of freedom”. Decisions that relate to transforming data into evidence are known as “evidentiary degrees of freedom” and decisions that relate to the interpretation of the evidence are known as “interpretive degrees of freedom” (Hardwicke & Wagenmakers, 2021). We will also learn about how preregistration – the process of writing down and registering your design and analysis decisions before you observe study outcomes – (and other tools) can be used to protect our research from bias and provide the transparency that other scientists need to properly evaluate and interpret our work.

Our bottom line is that the best practice is to document your experiment – including critical design, sampling, and analysis decisions – before collecting data. This documentation can help you think through your choices to ensure that they are maximally aligned with your goals. Further, the documentation can be time-stamped using an external registry and shared so as to show which decisions were post hoc (after observing study outcomes) and which were made in advance.

Undisclosed analytic flexibility?

Educational apps for children are a huge market, but relatively few high-quality, randomized trials have been done to see whether or when they produce educational gains. Berkowitz et al. (2015) reported a high-quality field experiment of educational apps, with participants randomly assigned to either a math or reading app over the course of a full school year. Critically, along with random assignment, the study also included standardized measures of math and reading achievement. These measures allowed the authors to compute effects in grade-level equivalents, a meaningful unit from a policy perspective. The key result reported by the paper is shown in Figure 11.1. Families who used the math app frequently showed greater gains in math than the control group.

Figure 1 of Berkowitz et al. (2015). Estimated years of math achievement gained over the school year across groups. Figure 11.1: Figure 1 of Berkowitz et al. (2015). Estimated years of math achievement gained over the school year across groups.

Although this finding appeared striking, the figure didn’t directly visualize the primary causal effect of interest, namely the size of the effect of study condition on math scores. Instead the data were presented as estimated effects for specific levels of app usage, for a “matched” subgroup of participants (panel A) and the entire group (panel B).

Estimated years of math achievement gained over the school year across groups in the Berkowitz et al. (2016) math app trial. Error bars show 95\% confidence intervals. Figure reproduced from Frank (2016). Figure 11.2: Estimated years of math achievement gained over the school year across groups in the Berkowitz et al. (2016) math app trial. Error bars show 95% confidence intervals. Figure reproduced from Frank (2016).

Because the authors made their data openly available, it was possible for Frank (2016) to do a simple first analysis to examine the causal effect of interest. When not splitting the data by usage, there was no significant main effect of the intervention on math performance [Figure 11.2]. Since this analysis was not favorable to the primary intervention – and because it was not reported in the paper – it was possible that the authors had analyzed the data several ways and chosen to present an analysis that was more favorable to their hypotheses of interest.169 The authors responded that their analyses were based on prior research and argued that the disagreement about how the data should be analyzed was a question of different approaches (Berkowitz et al., 2016).

Readers of Berkowitz et al. (2015) couldn’t know whether the analysis was influenced by looking at the data. As we’ll see below, data-dependent analysis can lead to substantial bias in reported effects. If the analysis plan had been preregistered, this simple step would have convinced readers that the analysis was unbiased and hence increased the value of this otherwise high-quality study.

11.1 Lost in a garden of forking paths

The garden of forking paths: many justifiable but different analytic choices are possible for an individual dataset. Figure 11.3: The garden of forking paths: many justifiable but different analytic choices are possible for an individual dataset.

One way to visualize evidentiary degrees of freedom is as a vast decision tree or “garden of forking paths” [Gelman & Loken (2014); Figure 11.3]. Each node represents a decision point in the analysis process and each branch represents a justifiable choice. Each unique pathway through the garden terminates in an individual research outcome.

Because scientific observations typically consist of both noise (random variation unique to this sample) and signal (regularities that will reoccur in other samples), some of these pathways will inevitably lead to outcomes that are misleading (e.g., inflated effect sizes, exaggerated evidence, or false positives).170 The signal-to-noise ratio is worse in situations (alas, common in psychology) that involve small effect sizes, high variation, and large measurement errors (Ioannidis, 2005). Evidentiary degrees of freedom may be constrained to some extent by strong theory (Oberauer & Lewandowsky, 2019), community methodological norms and standards, or replication studies, though these constraints may be more implicit than explicit, and can still leave plenty of room for flexible decision-making. The more potential paths there are in the garden that you might explore, the higher the chance of encountering misleading outcomes.

Statisticians refer to this issue as a multiplicity (multiple comparisons) problem. As we talked about in Chapter 6, multiplicity can be addressed to some extent with statistical countermeasures, like the Bonferroni correction; however, these adjustment methods need to account for every path that you could have taken (de Groot, 1956/2014; Gelman & Loken, 2014). When you navigate the garden of forking paths during the data analysis process, it is easy to forget – or even be unaware of – every path that you could have taken, so these methods can no longer be used effectively.

11.1.1 Outcome-dependent analysis

When a researcher navigates the garden of forking paths during data analysis, their decisions can also be biased – they are not only making choices about analysis, they are doing so on the basis of how those choices affect the research outcomes (outcome-dependent decision making). If a researcher is seeking a particular kind of outcome (which is likely – see the depth box below), then they are more likely to follow the branches that steer them in that direction.

You could think of this a bit like playing a game of “hot (🔥) or cold (☃️)” where 🔥 indicates that the choice will move the researcher closer to a desirable overall outcome and ☃️ indicates that the choice will move them further away. Each time the researcher reaches a decision point, they try one of the branches and get feedback on how that choice affects the outcome. If the feedback is 🔥 then they take that branch. If the answer is ☃️, they try a different branch. If they reach the end of a complete pathway, and the outcome is ☃️, maybe they even retrace their steps and try some different branches earlier in the pathway. This strategy create a risk of bias because the research outcomes are being systematically skewed towards the researcher’s preferences (Hardwicke & Wagenmakers, 2021).171 We say “risk of bias” rather than just “bias” because in most scientific contexts, we do not have a known ground truth to compare the outcomes to. So in any specific situation, we do not know the extent to which outcome-dependent analyses have actually biased the outcomes.

Only human: Cognitive biases and skewed incentives

There’s a storybook image of the scientist as an objective, rational, and dispassionate arbiter of truth (Veldkamp et al., 2017). But in reality, scientists are only human: they have egos, career ambitions, and rent to pay! So even if we do want to live up to the storybook image, its important to acknowledge that our decisions and behavior are also influenced by a range of cognitive biases and external incentives that can steer us away from that goal. Let’s first look at some relevant cognitive biases that might lead scientists astray:

  • Confirmation bias: Preferentially seeking out, recalling, or evaluating information in a manner that reinforces one’s existing beliefs (Nickerson, 1998).

  • Hindsight bias: Believing that past events were always more likely to occur relative to our actual belief in their likelihood before they happened (“I knew it all along!”) (Slovic & Fischhoff, 1977).

  • Motivated reasoning: Rationalizing prior decisions so they are framed in a favorable light, even if they were irrational (Kunda, 1990).

Examples of apophenia: Mars Face, Winnie the Pooh Cloud, and Jesus Toast. Figure 11.4: Examples of apophenia: Mars Face, Winnie the Pooh Cloud, and Jesus Toast.

The Chrysalis Effect, when ugly truth becomes a beautiful fiction. Figure 11.5: The Chrysalis Effect, when ugly truth becomes a beautiful fiction.

To make matters worse, the incentive structure of the scientific ecosystem often adds additional motivation to get things wrong. The allocation of funding, awards, and publication prestige is often based on the nature of research outcomes rather than research quality (B. A. Nosek et al., 2012; Smaldino & McElreath, 2016). For example, many academic journals, especially those that are widely considered to be the most prestigious, appear to have a preference for novel, positive, and statistically significant outcomes over incremental, negative, or null outcomes (Bakker et al., 2012). There is also pressure to write articles with concise, coherent, and compelling narratives (Giner-Sorolla, 2012). This set of forces incentivizes scientists to be “impressive” over being right and encourages questionable research practices. The process of iteratively p-hacking and HARKing one’s way to a “beautiful” scientific paper has been dubbed “The Chrysalis Effect” [O’Boyle et al. (2017); Figure 11.5].

In sum, scientists’ human flaws – and the scientific ecosystem’s flawed incentives – highlight highlights the need for transparency and intellectual humility when reporting the findings of our research (Hoekstra & Vazire, 2020).

By deliberately exploiting analytic flexibility in the processing pipeline of fMRI data, Bennet et al. (2009) were able to identify 'brain activity' in a dead Atlantic Salmon. Figure 11.6: By deliberately exploiting analytic flexibility in the processing pipeline of fMRI data, Bennet et al. (2009) were able to identify ‘brain activity’ in a dead Atlantic Salmon.

In the most egregious cases, a researcher may try multiple pathways until they obtain a desirable outcome and then selectively report that outcome, neglecting to mention that they have tried several other analysis strategies.172 “If you torture the data long enough, it will confess” (Good, 1972). You may remember an example of this practice in Chapter 3, where participants apparently became younger when they listened to “When I’m 64” by The Beatles. Another example of how damaging the garden of forking paths can be comes from the “discovery” of brain activity in a dead Atlantic Salmon! Researchers deliberately exploited flexibility in the fMRI analysis pipeline and avoided multiple comparisons corrections, allowing them to find brain activity where there was only dead fish [Figure 11.6; Bennett et al. (2009)].

11.1.2 Hypothesizing after results are known

In addition to evidentiary degrees of freedom, there is additional flexibility in how researchers explain research results. As we discussed in Chapter 2, theories can accommodate even conflicting results in many different ways – for example, by positing auxiliary hypotheses that explain why a particular datapoint is special. We might call these different routes for accommodating theory with data “interpretive degrees of freedom”.

The practice of selecting or developing your hypothesis after seeing the study outcomes has been called “Hypothesizing After the Results are Known”, or “HARKing” (Kerr, 1998). HARKing is potentially problematic because it expands the garden of forking paths and helps to justify the use of various evidentiary degrees of freedom (Figure 11.7). For example, you may come up with an explanation for why an intervention is effective in men but not in women in order to justify a post-hoc subgroup analysis based on sex (see Case Study). The extent to which HARKing is problematic is contested (for discussion see Hardwicke & Wagenmakers, 2021). But at the very least it’s important to be honest about whether hypotheses were developed before or after observing research outcomes.

Figure 11.7: A grid of individual research outcomes. The horizontal axis provides a simplified illustration of the many justifiable design and analysis choices that the scientist can use to generate the evidence. The vertical axis illustrates that there are often several potential hypotheses derived from those theories, which could be constructed or selected when interpreting the evidence. An unconstrained scientist can simultaneously fit evidence to hypotheses and fit hypotheses to evidence in order to obtain their preferred study outcome.

A grid of individual research outcomes. The horizontal axis provides a simplified illustration of the many justifiable design and analysis choices that the scientist can use to generate the evidence. The vertical axis illustrates that there are often several potential hypotheses derived from those theories, which could be constructed or selected when interpreting the evidence. An unconstrained scientist can simultaneously fit evidence to hypotheses and fit hypotheses to evidence in order to obtain their preferred study outcome.

But hang on a minute! Isn’t it a good thing to seek out interesting results if they are there in the data? Shouldn’t we “let the data speak”? The answer is yes! Exploratory research is not the same as p-hacking. P-hacking is explicitly dishonest because it involves deliberately withholding information. In contrast, exploratory data analysis is a critical part of the scientific process.

The important things to remember about exploratory research are that you need to (1) be aware of the increased risk of bias and calibrate your confidence in the outcomes accordingly; (2) be honest with other researchers about your analysis strategy so they are also aware of the risk of bias and can calibrate their confidence in the outcomes accordingly. It’s important to understand the distinction between exploratory and confirmatory research modes.173 In practice, an individual study may contain both exploratory and confirmatory aspects which is why we describe them as different “modes.” Confirmatory research involves making design and analysis decisions before research outcomes have been observed. In the next section, we will learn about how to do that using preregistration.

11.2 Reducing bias, increasing transparency, and calibrating confidence with preregistration

You can counter the problem of undisclosed researcher degrees of freedom by making research decisions before you are aware of the research outcomes – like planning your route through the garden of forking paths before you start your journey (Hardwicke & Wagenmakers, 2021; Wagenmakers et al., 2012).

Preregistration is the process of declaring your research decisions in an public registry before you analyze (and often before you collect) the data. Preregistration ensures that your decisions are outcome-independent, which reduces risk of bias arising from the issues described above. Preregistration also transparently conveys to others what you planned, helping them to determine the risk of bias and calibrate their confidence in the research outcomes. In other words, preregistration provides the context needed to properly evaluate and interpret research, dissuading researchers from engaging in questionable research practices like p-hacking and undisclosed HARKing, because they can be held accountable to their original plan.

Preregistration does not require that you specify all research decisions in advance, only that you are transparent about what was planned, and what was not planned. This transparency helps to make a distinction between which aspects of the research were exploratory and which were confirmatory (Figure 11.8). All else being equal, we should have more confidence in confirmatory findings, because there is a lower risk of bias. Exploratory analyses have a higher risk of bias, but they are also more sensitive to serendipitous (unexpected) discoveries. Exploratory and confirmatory research are both valuable activities – it is just important to differentiate them (Tukey, 1980)! Preregistration offers the best of both worlds by clearly separating one from the other.

Figure 11.8: Preregistration clarifies where research activities fall on the continuum of prespecification. When the preregistration provides little constraint over researcher degrees of freedom (i.e., more exploratory research), decisions are more likely to be outcome-dependent, and consequently there is a higher risk of bias. When preregistration provides strong constraint over researcher degrees of freedom (i.e., more confirmatory research), decisions are less likely to be outcome-dependent, and consequently there is a lower risk of bias. Exploratory research activities are more sensitive to serendipitous discovery, but also have a higher risk of bias relative to confirmatory research activities. Preregistration transparently communicates where particular research outcomes are located along the continuum, helping readers to appropriately calibrate their confidence.

Preregistration clarifies where research activities fall on the continuum of prespecification. When the preregistration provides little constraint over researcher degrees of freedom (i.e., more exploratory research), decisions are more likely to be outcome-dependent, and consequently there is a higher risk of bias. When preregistration provides strong constraint over researcher degrees of freedom (i.e., more confirmatory research), decisions are less likely to be outcome-dependent, and consequently there is a lower risk of bias. Exploratory research activities are more sensitive to serendipitous discovery, but also have a higher risk of bias relative to confirmatory research activities. Preregistration transparently communicates where particular research outcomes are located along the continuum, helping readers to appropriately calibrate their confidence.

In addition to the benefits described above, preregistration may improve the quality of research by encouraging closer attention to study planning. We’ve found that the process of writing a preregistration really helps facilitate communication between collaborators, and can catch addressable problems before time and resources are wasted on a poorly designed study. Detailed advanced planning can also create opportunities for useful community feedback, particularly in the context of Registered Reports (see Depth box below), where dedicated peer reviewers will evaluate your study before it has even begun.

Preregistration and friends: A toolbox to address researcher degrees of freedom

Several useful tools and concepts that can be used to complement or extend preregistration. In general, we would recommend that these are combined with preregistration, rather than used as a replacement. Preregistration provides transparency about the research and planning process, so its function complements other methods for avoiding bias (Hardwicke & Wagenmakers, 2021).

Robustness checks. Robustness checks (also called “sensitivity analyses”) assess how different decision choices in the garden of forking paths affect the eventual pattern of results. This technique is particularly helpful when you have to choose between several justifiable analytic choices, neither of which seem superior to the other, or which have complementary strengths and weaknesses. For example, you might run the analysis three times using three different methods for handling missing data. Robust results should not vary substantially across the three different choices.

Multiverse analyses. Recently, some researchers have started running large-scale robustness checks. These have been called “multiverse analysis” (Steegen et al., 2016) or “specification curve analysis” (Simonsohn et al., 2020). These techniques evaluate the factorial intersection of multiple choices for multiple decisions – like simultaneously evaluating thousands of pathways in the garden of forking paths. Some have argued that these large-scale robustness checks can make preregistration redundant; after all, why prespecify a single path if you can explore them all (Oberauer & Lewandowsky, 2019; Rubin, 2020)? But interpreting the results of a multiverse analysis are not straightforward; for example, it seems unlikely that all of the decision choices are equally justifiable (Giudice & Gangestad, 2021). Furthermore, if robustness checks are not preregistered, then they introduce researcher degrees of freedom, and create an opportunity for selective reporting, which increases risk of bias.

Held-out sample. One option to benefit from both exploratory and confirmatory research modes is to split your data into training and test samples. (The test sample is commonly called the “held out” because it is “held out” from the exploratory process.) You can generate hypotheses in an exploratory mode in the training sample and use that as the basis to preregister confirmatory analyses in the hold-out sample. A notable disadvantage of this strategy is that splitting the data reduces statistical power, but in cases where data are plentiful – including in much of machine learning – this technique is the gold standard.

Masked analysis (traditionally called “blind analysis”). Sometimes problems, such as missing data, attrition, or randomization failure can arise during data collection that you did not anticipate in your preregistered plan. How do you diagnose and address these issues without increasing risk of bias through outcome-dependent analysis? One option is masked analysis, which disguises aspects of the data related to the outcomes (for example, by shuffling condition labels or adding noise) while still allowing some degree of data inspection (Dutilh et al., 2019). After diagnosing a problem, you can adjust your preregistered plan without increasing risk of bias, because you have not engaged in outcome-dependent decision making.

Standard Operating Procedures. Community norms, perhaps at the level of your research field or lab, can act as a natural constraint on researcher degrees of freedom. For example, there may be a generally accepted approach for handling outliers in your community. You can make these constraints explicit by writing them down in a Standard Operating Procedures document - a bit like a living meta-preregistration (Lin & Green, 2016). Each time you preregister an individual study, you can co-register this document alongside it. Make sure you are clear about which document you will follow in the event of a mismatch!

Open lab notebooks. Maintaining a lab notebook can be a useful way to keep a record of your decisions as a research project unfolds. Preregistration is bit like taking a snapshot of your lab notebook at the start of the project, when all you have written down is your research plan. Making your lab notebook publicly available is a great way to transparently document your research and departures from the preregistered plan.

Registered Reports (https://www.cos.io/initiatives/registered-reports) Figure 11.9: Registered Reports (https://www.cos.io/initiatives/registered-reports)

Registered Reports. Registered Reports are a type of article format that embeds preregistration directly into the publication pipeline [Chambers & Tzavella (2020); Figure 11.9]. The idea is that you submit your preregistered protocol to a journal and it is peer reviewed, before you’ve even started your study. If the study is approved, the journal agrees to publish it, regardless of the outcomes. This is a radical departure from traditional publication models where peer reviewers and journals evaluate your study after its been completed and the outcomes are known. Because the study is accepted for publication independently of the outcomes, Registered Reports can offer the benefits of preregistration with additional protection against publication bias. They also provide a great opportunity to obtain feedback on your study design while you can still change it!

11.3 How to preregister

High-stakes studies such as medical trials must be preregistered (Dickersin & Rennie, 2012). In 2005, a large international consortium of medical journals decided that they would not publish unregistered trials. The discipline of economics also has strong norms about study registration (see e.g. https://www.socialscienceregistry.org). But preregistration is actually pretty new to psychology (B. A. Nosek et al., 2018), and there’s still no standard way of doing it – you’re already at the cutting edge!

We recommend using the Open Science Framework (OSF) as your registry. OSF is one of the most popular registries in psychology and you can do lots of other useful things on the platform to make your research transparent, like sharing data, materials, analysis scripts, and preprints. On the OSF it is possible to “register” any file you have uploaded. When you register a file, it creates a time-stamped, read-only copy, with a dedicated link. You can add this link to articles reporting your research.

One approach to preregistration is to write a protocol document that specifies the study rationale, aims or hypotheses, methods, and analysis plan, and register that document.174 You can think of a study protocol a bit like a research paper without a results and discussion section (here’s an example from one of our own studies: https://osf.io/2cnkq/). The OSF also has a collection of dedicated preregistration templates that you can use if you prefer. These templates are often tailored to the needs of particular types of research. For example, there are templates for general quantitative psychology research (“PRP-QUANT” Bosnjak et al., 2021), cognitive modelling (Crüwell & Evans, 2021), and secondary data analysis (Akker et al., 2019). The OSF interface may change, but currently this guide provides a set of steps to create a preregistration.

Once you’ve preregistered your plan, you just go off and run the study and report the results, right? Well hopefully… but things might not turn out to be that straightforward. It’s quite common to forget to include something in your plan or to have to depart from the plan due to something unexpected. Preregistration can actually be pretty hard in practice (B. A. Nosek et al., 2019)!

Don’t worry though - remember that the primary goal of preregistration is transparency to enable others to evaluate and interpret our work. If you decide to depart from your original plan and conduct outcome-dependent analyses, then this decision may increase the risk of bias. But if you communicate this decision transparently to your readers, they can appropriately calibrate their confidence in the research outcome. You may even be able to run both the planned and unplanned analyses as a robustness check (see Box) to evaluate the extent to which this particular choice impacts the outcomes.

When you report your study, it is important to distinguish between what was planned and what was not. If you ran a lot of outcome-dependent analyses, then it might be worth having separate exploratory and confirmatory results sections. On the other hand, if you mainly stuck to your original plan, with only minor departures, then you could include a table (perhaps in an appendix) that outlines these changes (for example, see Supplementary Information A of this article).

11.4 Chapter summary: Preregistration

We’ve advocated here for preregistering your study plan. This practice allows us to minimize bias caused by outcome-dependent analysis (the “garden of forking paths” that we described). Preregistration is a “plan, not a prison”: in most cases preregistered, confirmatory analyses coexist with exploratory analyses. Both are an important part of good research – the key is to disclose which is which!

  1. P-hack your way to scientific glory! To get a feel for how results-dependent analyses might work in practice, have a play around with this app: https://projects.fivethirtyeight.com/p-hacking/

  2. Preregister your next experiment! The best way to get started with preregistration is to have a go with your next study. Head over to https://osf.io/registries/osf/new and register your study protocol or complete one of the templates.

  • Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115, 2600–2606. https://doi.org/10.1073/pnas.1708274114

  • Hardwicke, T. E., & Wagenmakers, E.-J. (2022). Reducing bias, increasing transparency, and calibrating confidence with preregistration. MetaArXiv. https://doi.org/10.31222/osf.io/d7bcu

References

Akker, O. van den, Weston, S. J., Campbell, L., Chopik, W. J., Damian, R. I., Davis-Kean, P., Hall, A., Kosie, J., Kruse, E., Olsen, J., Ritchie, S. J., Valentine, K. D., Veer, A. van ’t., & Bakker, M. (2019). Preregistration of secondary data analysis: A template and tutorial. PsyArXiv. https://psyarxiv.com/hvfmr/
Bakker, M., Dijk, A. van, & Wicherts, J. M. (2012). The rules of the game called psychological science. Perspectives on Psychological Science, 7(6), 543–554. https://doi.org/10.1177/1745691612459060
Barber, T. X. (1976). Pitfalls in Human Research: Ten Pivotal Points. Pergamon Press.
Bennett, C., Miller, M., & Wolford, G. (2009). Neural correlates of interspecies perspective taking in the post-mortem Atlantic Salmon: An argument for multiple comparisons correction. NeuroImage, 47, S125. https://doi.org/10.1016/S1053-8119(09)71202-9
Berkowitz, T., Schaeffer, M. W., Maloney, E. A., Peterson, L., Gregor, C., Levine, S. C., & Beilock, S. L. (2015). Math at home adds up to achievement in school. Science, 350(6257), 196–198. https://doi.org/10.1126/science.aac7427
Berkowitz, T., Schaeffer, M. W., Rozek, C. S., Maloney, E. A., Levine, S. C., & Beilock, S. L. (2016). Response to comment on “math at home adds up to achievement in school.” Science, 351(6278), 1161–1161.
Bosnjak, M., Fiebach, C., Mellor, D. T., Mueller, S., O’Connor, D., Oswald, F., & Sokol-Chang, R. (2021). A template for preregistration of quantitative research in psychology: Report of the joint psychological societies preregistration task force. PsyArXiv. https://doi.org/10.31234/osf.io/d7m5r
Chambers, C., & Tzavella, L. (2020). Registered Reports: Past, present and future. MetaArXiv. https://doi.org/10.31222/osf.io/43298
Crüwell, S., & Evans, N. J. (2021). Preregistration in diverse contexts: A preregistration template for the application of cognitive models. Royal Society Open Science, 8(10), 210155. https://doi.org/10.1098/rsos.210155
de Groot, A. D. (1956/2014). The meaning of “significance” for different types of research (E.-J. Wagenmakers, D. Borsboom, J. Verhagen, R. A. Kievit, M. Bakker, A. O. J. Cramer, D. Matzke, D. Mellenbergh, & H. L. J. van der Maas, Trans.). Acta Psychologica, 148, 188–194. https://doi.org/10.1016/j.actpsy.2014.02.001
Dickersin, K., & Rennie, D. (2012). The evolution of trial registries and their use to assess the clinical trial enterprise. JAMA, 307(17), 1861–1864. https://doi.org/10.1001/jama.2012.4230
Dutilh, G., Sarafoglou, A., & Wagenmakers, E.-J. (2019). Flexible yet fair: Blinding analyses in experimental psychology. Synthese. https://doi.org/https://doi.org/10.1007/s11229-019-02456-7
Feynman, R. P. (1974). Cargo Cult Science. http://calteches.library.caltech.edu/51/2/CargoCult.pdf
Frank, M. C. (2016). Comment on “math at home adds up to achievement in school.” In Science (No. 6278; Vol. 351, pp. 1161.2–1161).
Gelman, A., & Loken, E. (2014). The statistical crisis in science. American Scientist, 102(6), 460–465. https://doi.org/10.1511/2014.111.460
Gilovich, T., Vallone, R., & Tversky, A. (1985). The hot hand in basketball: On the misperception of random sequences. Cognitive Psychology, 17(3), 295–314. https://doi.org/10.1016/0010-0285(85)90010-6
Giner-Sorolla, R. (2012). Science or art? How aesthetic standards grease the way through the publication bottleneck but undermine science. Perspectives on Psychological Science, 7(6), 562–571. https://doi.org/10.1177/1745691612457576
Giudice, M. D., & Gangestad, S. (2021). A traveler’s guide to the multiverse: Promises, pitfalls, and a framework for the evaluation of analytic decisions. Advances in Methods and Practices in Psychological Science, 4(1), 1–15. https://doi.org/https://doi.org/10.1177/2515245920954925
Good, I. J. (1972). Statistics and Today’s Problems. The American Statistician, 26(3), 11–19. https://doi.org/10.1080/00031305.1972.10478922
Hardwicke, T. E., & Wagenmakers, E.-J. (2021). Preregistration: A pragmatic tool to reduce bias and calibrate confidence in scientific research. MetaArXiv. https://doi.org/10.31222/osf.io/d7bcu
Hoekstra, R., & Vazire, S. (2020). Intellectual humility is central to science [Preprint]. https://osf.io/edh2s
Ioannidis, J. P. A. (2005). Why most published research findings are false. PLOS Medicine, 2(8), e124. https://doi.org/10.1371/journal.pmed.0020124
Kerr, N. L. (1998). HARKing: Hypothesizing After the Results are Known. Personality & Social Psychology Review (Lawrence Erlbaum Associates), 2(3), 196. https://doi.org/10.1207/s15327957pspr0203_4
Kunda, Z. (1990). The case for motivated reasoning. Psychological Bulletin, 108(3), 480–498. https://doi.org/10.1037/0033-2909.108.3.480
Lin, W., & Green, D. P. (2016). Standard operating procedures: A safety net for pre-analysis plans. PS: Political Science & Politics, 49(03), 495–500. https://doi.org/10.1017/S1049096516000810
Nickerson, R. S. (1998). Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology, 2(2), 175–220. https://doi.org/10.1037/1089-2680.2.2.175
Nosek, B. A., Beck, E. D., Campbell, L., Flake, J. K., Hardwicke, T. E., Mellor, D. T., Veer, A. E. van ’t, & Vazire, S. (2019). Preregistration is hard, and worthwhile. Trends in Cognitive Sciences, 23(10), 815–818. https://doi.org/10.1016/j.tics.2019.07.009
Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115(11), 2600–2606. https://doi.org/10.1073/pnas.1708274114
Nosek, B. A., Spies, J. R., & Motyl, M. (2012). Scientific Utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspectives on Psychological Science, 7(6), 615–631. https://doi.org/10.1177/1745691612459058
O’Boyle, E. H., Banks, G. C., & Gonzalez-Mulé, E. (2017). The chrysalis effect: How ugly initial results metamorphosize into beautiful articles. Journal of Management, 43(2), 376–399. https://doi.org/10.1177/0149206314527133
Oberauer, K., & Lewandowsky, S. (2019). Addressing the theory crisis in psychology. Psychonomic Bulletin & Review, 26(5), 1596–1618.
Rubin, M. (2020). Does preregistration improve the credibility of research findings? The Quantitative Methods for Psychology, 16(4), 15. https://doi.org/10.20982/tqmp.16.4.p376
Simonsohn, U., Simmons, J. P., & Nelson, L. D. (2020). Specification curve analysis. Nature Human Behaviour, 1–7. https://doi.org/10.1038/s41562-020-0912-z
Slovic, P., & Fischhoff, B. (1977). On the psychology of experimental surprises. Journal of Experimental Psychology: Human Perception and Performance, 3(4), 544–551. https://doi.org/10.1037/0096-1523.3.4.544
Smaldino, P. E., & McElreath, R. (2016). The natural selection of bad science. Royal Society Open Science, 3(9), 160384. https://doi.org/10.1098/rsos.160384
Steegen, S., Tuerlinckx, F., Gelman, A., & Vanpaemel, W. (2016). Increasing transparency through a multiverse analysis. Perspectives on Psychological Science, 11(5), 702–712. https://doi.org/10.1177/1745691616658637
Tukey, J. W. (1980). We need both exploratory and confirmatory. The American Statistician, 34(1), 23–25. https://doi.org/10.2307/2682991
Veldkamp, C. L. S., Hartgerink, C. H. J., Assen, M. A. L. M. van, & Wicherts, J. M. (2017). Who believes in the storybook image of the scientist? Accountability in Research, 24(3), 127–151. https://doi.org/10.1080/08989621.2016.1268922
Wagenmakers, E.-J., Wetzels, R., Borsboom, D., Maas, H. L. J. van der, & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7(6), 632–638. https://doi.org/10.1177/1745691612463078