When not planned beforehand, data analysis can approximate a projective technique, such as the Rorschach, because the investigator can project on the data his own expectancies, desires, or biases and can pull out of the data almost any “finding” he may desire.
The first principle is that you must not fool yourself–and you are the easiest person to fool… After you’ve not fooled yourself, it’s easy not to fool other scientists. You just have to be honest in a conventional way after that.
Although there are plenty of incorrect ways to design and analyse experiments, there is no single correct way to design and analyze an experiment. In fact, for most research decisions, there are a multitude of justifiable options. For example, will you stop data collection after 20, 200, or 2000 participants? Will you remove outlier values and how will you define them? Will you conduct subgroup analyses to see whether the results are affected by sex, or age, or some other factor?
Consider a simplified, hypothetical case where you need to make five analysis decisions and have five justifiable options for each decision — this alone would result in 3125 (\(5^5\)) unique ways to analyze your data! Imagine you read a paper reporting the result of just one of these ways of analyzing the data, without telling you that any of the other choices that could have been taken would have yielded a very different result. This undisclosed flexibility would dramatically decrease your confidence.
In this chapter, we will find out why undisclosed flexibility in the design, analysis, reporting, and interpretation of experiments can lead to scientists fooling themselves and fooling each other. Collectively, these research decisions are known as “researcher degrees of freedom”. Decisions that relate to transforming data into evidence are known as “evidentiary degrees of freedom” and decisions that relate to the interpretation of the evidence are known as “interpretive degrees of freedom” (Hardwicke & Wagenmakers, 2021). We will also learn about how preregistration – the process of writing down and registering your design and analysis decisions before you observe study outcomes – (and other tools) can be used to protect our research from bias and provide the transparency that other scientists need to properly evaluate and interpret our work.
Our bottom line is that the best practice is to document your experiment – including critical design, sampling, and analysis decisions – before collecting data. This documentation can help you think through your choices to ensure that they are maximally aligned with your goals. Further, the documentation can be time-stamped using an external registry and shared so as to show which decisions were post hoc (after observing study outcomes) and which were made in advance.
Figure 11.3: The garden of forking paths: many justifiable but different analytic choices are possible for an individual dataset.
One way to visualize evidentiary degrees of freedom is as a vast decision tree or “garden of forking paths” [Gelman & Loken (2014); Figure 11.3]. Each node represents a decision point in the analysis process and each branch represents a justifiable choice. Each unique pathway through the garden terminates in an individual research outcome.
Because scientific observations typically consist of both noise (random variation unique to this sample) and signal (regularities that will reoccur in other samples), some of these pathways will inevitably lead to outcomes that are misleading (e.g., inflated effect sizes, exaggerated evidence, or false positives).170 The signal-to-noise ratio is worse in situations (alas, common in psychology) that involve small effect sizes, high variation, and large measurement errors (Ioannidis, 2005). Evidentiary degrees of freedom may be constrained to some extent by strong theory (Oberauer & Lewandowsky, 2019), community methodological norms and standards, or replication studies, though these constraints may be more implicit than explicit, and can still leave plenty of room for flexible decision-making. The more potential paths there are in the garden that you might explore, the higher the chance of encountering misleading outcomes.
Statisticians refer to this issue as a multiplicity (multiple comparisons) problem. As we talked about in Chapter 6, multiplicity can be addressed to some extent with statistical countermeasures, like the Bonferroni correction; however, these adjustment methods need to account for every path that you could have taken (de Groot, 1956/2014; Gelman & Loken, 2014). When you navigate the garden of forking paths during the data analysis process, it is easy to forget – or even be unaware of – every path that you could have taken, so these methods can no longer be used effectively.
When a researcher navigates the garden of forking paths during data analysis, their decisions can also be biased – they are not only making choices about analysis, they are doing so on the basis of how those choices affect the research outcomes (outcome-dependent decision making). If a researcher is seeking a particular kind of outcome (which is likely – see the depth box below), then they are more likely to follow the branches that steer them in that direction.
You could think of this a bit like playing a game of “hot (🔥) or cold (☃️)” where 🔥 indicates that the choice will move the researcher closer to a desirable overall outcome and ☃️ indicates that the choice will move them further away. Each time the researcher reaches a decision point, they try one of the branches and get feedback on how that choice affects the outcome. If the feedback is 🔥 then they take that branch. If the answer is ☃️, they try a different branch. If they reach the end of a complete pathway, and the outcome is ☃️, maybe they even retrace their steps and try some different branches earlier in the pathway. This strategy create a risk of bias because the research outcomes are being systematically skewed towards the researcher’s preferences (Hardwicke & Wagenmakers, 2021).171 We say “risk of bias” rather than just “bias” because in most scientific contexts, we do not have a known ground truth to compare the outcomes to. So in any specific situation, we do not know the extent to which outcome-dependent analyses have actually biased the outcomes.
Figure 11.6: By deliberately exploiting analytic flexibility in the processing pipeline of fMRI data, Bennet et al. (2009) were able to identify ‘brain activity’ in a dead Atlantic Salmon.
In the most egregious cases, a researcher may try multiple pathways until they obtain a desirable outcome and then selectively report that outcome, neglecting to mention that they have tried several other analysis strategies.172 “If you torture the data long enough, it will confess” (Good, 1972). You may remember an example of this practice in Chapter 3, where participants apparently became younger when they listened to “When I’m 64” by The Beatles. Another example of how damaging the garden of forking paths can be comes from the “discovery” of brain activity in a dead Atlantic Salmon! Researchers deliberately exploited flexibility in the fMRI analysis pipeline and avoided multiple comparisons corrections, allowing them to find brain activity where there was only dead fish [Figure 11.6; Bennett et al. (2009)].
In addition to evidentiary degrees of freedom, there is additional flexibility in how researchers explain research results. As we discussed in Chapter 2, theories can accommodate even conflicting results in many different ways – for example, by positing auxiliary hypotheses that explain why a particular datapoint is special. We might call these different routes for accommodating theory with data “interpretive degrees of freedom”.
The practice of selecting or developing your hypothesis after seeing the study outcomes has been called “Hypothesizing After the Results are Known”, or “HARKing” (Kerr, 1998). HARKing is potentially problematic because it expands the garden of forking paths and helps to justify the use of various evidentiary degrees of freedom (Figure 11.7). For example, you may come up with an explanation for why an intervention is effective in men but not in women in order to justify a post-hoc subgroup analysis based on sex (see Case Study). The extent to which HARKing is problematic is contested (for discussion see Hardwicke & Wagenmakers, 2021). But at the very least it’s important to be honest about whether hypotheses were developed before or after observing research outcomes.
But hang on a minute! Isn’t it a good thing to seek out interesting results if they are there in the data? Shouldn’t we “let the data speak”? The answer is yes! Exploratory research is not the same as p-hacking. P-hacking is explicitly dishonest because it involves deliberately withholding information. In contrast, exploratory data analysis is a critical part of the scientific process.
The important things to remember about exploratory research are that you need to (1) be aware of the increased risk of bias and calibrate your confidence in the outcomes accordingly; (2) be honest with other researchers about your analysis strategy so they are also aware of the risk of bias and can calibrate their confidence in the outcomes accordingly. It’s important to understand the distinction between exploratory and confirmatory research modes.173 In practice, an individual study may contain both exploratory and confirmatory aspects which is why we describe them as different “modes.” Confirmatory research involves making design and analysis decisions before research outcomes have been observed. In the next section, we will learn about how to do that using preregistration.
You can counter the problem of undisclosed researcher degrees of freedom by making research decisions before you are aware of the research outcomes – like planning your route through the garden of forking paths before you start your journey (Hardwicke & Wagenmakers, 2021; Wagenmakers et al., 2012).
Preregistration is the process of declaring your research decisions in an public registry before you analyze (and often before you collect) the data. Preregistration ensures that your decisions are outcome-independent, which reduces risk of bias arising from the issues described above. Preregistration also transparently conveys to others what you planned, helping them to determine the risk of bias and calibrate their confidence in the research outcomes. In other words, preregistration provides the context needed to properly evaluate and interpret research, dissuading researchers from engaging in questionable research practices like p-hacking and undisclosed HARKing, because they can be held accountable to their original plan.
Preregistration does not require that you specify all research decisions in advance, only that you are transparent about what was planned, and what was not planned. This transparency helps to make a distinction between which aspects of the research were exploratory and which were confirmatory (Figure 11.8). All else being equal, we should have more confidence in confirmatory findings, because there is a lower risk of bias. Exploratory analyses have a higher risk of bias, but they are also more sensitive to serendipitous (unexpected) discoveries. Exploratory and confirmatory research are both valuable activities – it is just important to differentiate them (Tukey, 1980)! Preregistration offers the best of both worlds by clearly separating one from the other.
In addition to the benefits described above, preregistration may improve the quality of research by encouraging closer attention to study planning. We’ve found that the process of writing a preregistration really helps facilitate communication between collaborators, and can catch addressable problems before time and resources are wasted on a poorly designed study. Detailed advanced planning can also create opportunities for useful community feedback, particularly in the context of Registered Reports (see Depth box below), where dedicated peer reviewers will evaluate your study before it has even begun.
High-stakes studies such as medical trials must be preregistered (Dickersin & Rennie, 2012). In 2005, a large international consortium of medical journals decided that they would not publish unregistered trials. The discipline of economics also has strong norms about study registration (see e.g. https://www.socialscienceregistry.org). But preregistration is actually pretty new to psychology (B. A. Nosek et al., 2018), and there’s still no standard way of doing it – you’re already at the cutting edge!
We recommend using the Open Science Framework (OSF) as your registry. OSF is one of the most popular registries in psychology and you can do lots of other useful things on the platform to make your research transparent, like sharing data, materials, analysis scripts, and preprints. On the OSF it is possible to “register” any file you have uploaded. When you register a file, it creates a time-stamped, read-only copy, with a dedicated link. You can add this link to articles reporting your research.
One approach to preregistration is to write a protocol document that specifies the study rationale, aims or hypotheses, methods, and analysis plan, and register that document.174 You can think of a study protocol a bit like a research paper without a results and discussion section (here’s an example from one of our own studies: https://osf.io/2cnkq/). The OSF also has a collection of dedicated preregistration templates that you can use if you prefer. These templates are often tailored to the needs of particular types of research. For example, there are templates for general quantitative psychology research (“PRP-QUANT” Bosnjak et al., 2021), cognitive modelling (Crüwell & Evans, 2021), and secondary data analysis (Akker et al., 2019). The OSF interface may change, but currently this guide provides a set of steps to create a preregistration.
Once you’ve preregistered your plan, you just go off and run the study and report the results, right? Well hopefully… but things might not turn out to be that straightforward. It’s quite common to forget to include something in your plan or to have to depart from the plan due to something unexpected. Preregistration can actually be pretty hard in practice (B. A. Nosek et al., 2019)!
Don’t worry though - remember that the primary goal of preregistration is transparency to enable others to evaluate and interpret our work. If you decide to depart from your original plan and conduct outcome-dependent analyses, then this decision may increase the risk of bias. But if you communicate this decision transparently to your readers, they can appropriately calibrate their confidence in the research outcome. You may even be able to run both the planned and unplanned analyses as a robustness check (see Box) to evaluate the extent to which this particular choice impacts the outcomes.
When you report your study, it is important to distinguish between what was planned and what was not. If you ran a lot of outcome-dependent analyses, then it might be worth having separate exploratory and confirmatory results sections. On the other hand, if you mainly stuck to your original plan, with only minor departures, then you could include a table (perhaps in an appendix) that outlines these changes (for example, see Supplementary Information A of this article).
We’ve advocated here for preregistering your study plan. This practice allows us to minimize bias caused by outcome-dependent analysis (the “garden of forking paths” that we described). Preregistration is a “plan, not a prison”: in most cases preregistered, confirmatory analyses coexist with exploratory analyses. Both are an important part of good research – the key is to disclose which is which!