The most prominent funder of health research in Ireland, the Health Research Board, will soon open the next round of the Definitive Intervention and Feasibility Awards (2023 DIFA call info here). This is their main funding call for investigator-led randomized controlled trials. As one of the few experienced, university-based trial statisticians in Ireland, part of my job is to support exactly these kinds of studies. So in preparation for the very welcome onslaught of emails I’ll get from clinical investigators wanting to disuss their appplications, I just wanted to share some considerations about trial design with the wider community and suggest a few good papers you might read on these topics. These are based on my own experiences supporting trials, including two ongoing studies that were funded through the DIFA scheme, as well as the material from the two postgraduate modules on clinical trial design and analysis that I teach at UCC.
Yes, you actually need a statistician
Successful clinical trials require teams with a variety of expertise. Though many scientists and clinical investigators receive some training in statistics and study design, this training is usually rudimentary, and often clumsy. The trial statistician is thus needed to provide expert guidance in these areas, and so should be heavily involved in the design, analysis, and reporting phases of the project. However, to do their job properly, the statistician must rely on the subject matter expertise of the clinical investigators. Thus the scientific goals of clinical trials are best met when there are close collaborations between clinical investigators and statisticians learning from one another.
This may sound self-serving - me the statstician saying that you need a trial statistician. But there is a reason that ICH-GCP says that any regulated CTIMP must have a qualified trial statistician involved. Further, the Declaration of Helsini makes clear that all medical research should be rigorous and methodologically sound and avoid research waste. At the end of the day, poorly designed trials and flawed analyses can and do cause patient harm, so please make sure you seek out the right support for your study.
On this point, while I’m happy to have a chat with whomever working wherever, it’s really important to build relationships with local trial supports. So if you aren’t blessed to live in Cork, there are good trial statisticians at UG, UL, UCD, Trinity and RCSI that I can put you in touch with if needed. One of them, or someone with equivalent experience and expertise, should be named on your DIFA, end of.
Selected reading:
The Statistician's Role in Developing a Protocol for a Clinical Trial
World Medical Association Declaration of Helsinki
The scandal of poor medical research
Comparators, equipoise and ethics
When we run a RCT, we are exposing patients to unknown risks in the hopes of learning something important about treatment that might benefit future patients, as well as those enrolled in the trial. For this to be ethical, it means that at a minimum, all patients enrolled in the trial must get at least the same quality of care as they would have if they hadn’t enrolled in the trial. It also means that there should be a genuine uncertainty about the potential benefits of the new intervention, so that we are in a state of equipoise. Equipoise can be hard to demonstrate, so it’s important for investigators to clearly make their case that it exists (though this often doesn’t happen). The ethical obligations of the trial also mean that we should stand to learn something important, which means that shoddy comparators (known to be substandard, thus stacking the deck in favour of the new treatment) and other preventable design flaws are arguably unethical. So one of the first things you will need to establish for the trial will be the current standard of care and how we can ensure it will be maintained for every patient that might enroll into the trial you are planning.
Selected reading:
Equipoise in Research - Integrating Ethics and Science in Human Research
Is the concept of clinical equipoise still relevant to research?
Choice of control group in randomised trials of cancer medicine: are we testing trivialities?
Outcomes
We define the effects of interventions in terms of their impact on outcomes. In other words, outcomes are the variables we want to change in response to an intervention. So first and foremost, we must ensure that the outcomes we use in a trial are actually important, and be cautious about using so-called surrogate outcomes, which may reflect that the intervention did something, but not necessarily what we wanted it to do. Outcomes must also be precisely defined and, of course, measurable. Your description of an outcome should also avoid any qualitative statements (e.g. systolic blood pressure is an outcome, while “improved” systolic blood pressure is not).
Our choice of outcomes will also have important implications for the overall design of the trial. Generally, outcome that are noisier (have more natural variance, which is typical of subjective measures), or rarer, will require a larger sample of patients to demonstrate the effect of an intervention. That said, we must often accommodate this if those in fact are the most important outcomes. However, one avoidable but still frequently made mistake is the categorization of inherently continuous outcomes, which always results in a loss of information and needlessly lowers the power of the study (and even if some categorized outcome is perceived as being more relevant, this can always be captured from the analysis of the underlying continuous outcome).
Selected reading:
The perils of surrogate endpoints
Cardiology World Erupts Into Controversy Over Change In Major Clinical Trial (outcome switching)
FDA draft guidance on multiple endpoints
Patient selection
We must carefully consider which patients to enroll on our RCT. This will largely depend on the overall goals of the trial. In the initial phase 3 trials for a new treatment, we are often most interested in seeing if the intervention can work in a “best-case” scenario. This means that we select patients in a manner that maximizes the internal validity of the trial, with little or no consideration for the external validity or generalizability. In turn, this means recruiting patients that are most likely to benefit from, and adhere to, the proposed treatment (based on our current understanding). It also means recruiting a more homogeneous sample to reduce natural variability in the outcome. This will make it easier to see the effects of the intervention, if there is one, using a smaller sample than would be required to see the same effect in a broader, more varied sample. This best-case scenario makes sense for newly tested treatments, since it costs less money and exposes fewer patients to as of yet unknown risks, and because if we fail to see an effect under these optimal conditions, it’s probably safe to conclude that we can move on from this intervention to pursue other ideas.
That said, once we’ve demonstrated the efficacy of an intervention in trials with a high internal validity (i.e. we’ve demonstrated that the intervention can work), we will like want to see if it does work (paraphrasing Senn) when applied in something that looks more like normal clinical practice. This is where effectiveness (or pragmatic) trials come in, where we want more broadly representative samples and scenarios. The implication is that there might be practical issues with compliance in a broader sample, or that there might be heterogeneity in treatment effects (HTE). This means that some groups of patients will benefit more or less from the intervention than other groups, and that these different groups might have been disproportionately represented in the earlier efficacy trials we described above. Unfortunately, the sample size required to demonstrate such interactions can be much larger than that needed to demonstrate the marginal (on average) effect of the treatment, so frequently concerns about HTE are not evidenced. Regardless, demonstrating a beneficial average treatment effect in a more generalizable sample is still very comforting, especially to people who make health technology assessments (i.e. the people who help decide which treatments to fund with public money).
Importantly, even if we have reason to expect no HTE, we want to be mindful of social representation. This means working to ensure that all relevant patients have opportunities to participate in clinical trials, since patients enrolled on trials often have better outcomes than those who aren’t enrolled, even when the new treatment doesn’t out-perform standard care . Poor representation, which is unfortunately typical in clinical trials (for example, of women, or of underrepresented, minoritized people), can also degrade trust in clinical research in general, and understandably so.
Lastly, we control the composition of the patient sample with inclusion and exclusion criteria. A common mistake is to make a list where the list exclusions are just the conversely-stated inclusion criteria (e.g. inclusion: age >= 65 years; exclusion: age < 65 years). It’s better to consider inclusion as controlling entry into the trial to get a more/less homogeneous sample depending on your aims, and to precisely define the disease/problem we are trying to impact. Exclusions then, which should usually be fewer, are typically used to excluded people that can’t consent, that aren’t expected to possibly benefit from the new treatment, or that might be a higher than acceptable risk (in either arm). Importantly, we have a lot of evidence to suggest that patients are often needlessly excluded from trial participation, so you want to make sure that you have a solid justification for each of your entry criteria. After all, it’s not only perhaps unethical to needlessly exclude patients from trials, it also add barriers to your recruitment efforts, the area where uninformative trials tend to have fallen short.
Selected reading:
Evaluating inclusion and exclusion criteria
Why representativeness should be avoided
Assessing and reporting heterogeneity in treatment effects in clinical trials: a proposal
Covariates
There is some disagreement among trialists about how to treat covariate information. In general, model based adjustment for strong predictors of the outcome will result in a more efficient (more powerful) estimator of the treatment effect. This means we don’t need to enroll as many patients on the trial to detect the minimally important effect size (or we have even higher power to detect this effect with the same number of patients). This part is uncontroversial. However, some people see any covariate adjustment as a problem, especially if they suspect that the choice of covariates to adjust for is made after seeing the data, with the intent to produce a small p-value (p-hacking). Other people are fine with covariate adjustment, but they choose their covariates based on perceived imbalances in the covariate distributions between trial arms (so called “table 1 tests”). However, this procedure is sub-optimal, recommended against by all competent authorities, and opens the investigator up to the accusations of p-hacking we just discussed.
The correct way to account for covariate information is to use your subject matter expertise and understanding of the outcome to select the strongest prognostic factors before the study begins, and to pre-register these decisions in the statistical analysis plan attached to the clinical trial registration. Then the reported analyses at the end of the study should match what was declared in the pre-registered SAP (and thus couldn’t have been p-hacked).
The final point is how to handle baseline information. The baseline is a measure of the outcome that is taken prior to randomization. Baselines are thus often powerful predictors of the later outcome and thus a good choice for model based adjustment. However, instead of this, some investigators calculate change scores (the outcome at the end of the study minus the outcome measured at baseline) and use that for the outcome in the trial analysis. What they don’t realize is that such a change score will still be correlated with the baseline values (but now in the opposite direction), and thus still benefit from an adjustment for baseline; and that the estimated effect of an intervention on the change score adjusted for baseline will be exactly the same as that on the (raw) outcome adjusted for baseline. While there are some scenarios where the unadjusted estimator of the treatment effect will be more efficient using change scores vs raw outcome, the baseline adjusted estimator is always more efficient than unadjusted change scores.
Selected reading:
Randomization, allocation concealment and blinding
Most of us understand the importance of randomization for preventing any selection bias when recruiting patients into the trial. However, to actually accomplish this, we must maintain strict allocation concealment, which requires that the following are true:
The allocation for a patient cannot possibly be known by the trial staff until after they are unambiguously and irreversibly enrolled onto the trial (this of course doesn't mean they can't drop out of the trial, they just can't disappear without a trace as if never enrolled).
Once a patient is allocated, their allocation cannot be altered (again, this of course doesn't mean that they can't be moved onto another treatment, just that their initial allocation can't be changed without any indication).
Given the importance of allocation concealment for preventing selection bias, it’s critically important that investigators use trustworthy systems that don't rely on the trustworthiness of investigators. Thus computer databases and remote services should be used in serious trials, and stuffed envelopes should generally be avoided. Similarly, once we’ve rightly gone through the effort to rigorously conceal the randomized treatment allocations, it is similarly useful to keep as many people as possible (patients, clinicians, statisticians) blinded to that allocation throughout the study, or at least for as long as possible.
Finally, with respect to the randomization list itself, it’s usually important to consider restriction and stratification. Given that estimators are more efficient when there is an even split of participants across study arms, it can be a good idea to restrict the randomization list to force an equal (or very nearly equal) allocation of patients across arms. However, in larger samples (n > 100 maybe), the probability of a large enough imbalance that appreciably affects the estimator's efficiency quickly gets small.
However, there is another reason to restrict a randomization list, and that is when we decide to stratify on one or more key factors that are prognostic for the outcome. There is a wide-spread misconception about stratification, which is that it solves the problem of key covariate imbalances across study arms. Importantly, however, this is an incomplete solution to that problem, because if we force balance in a covariate by stratifying on it, we then need to adjust for that covariate in our statistical model - to not do so would be analogous to treating matched data as if they were unmatched. In other words, you must "analyze as you randomize" (Senn). Further, by adjusting for the covariate, you fix the problem you were trying to solve with the stratification, which works even if you didn't stratify in the first place. However, stratification, when feasible, is still a good idea as it can increases the trust that your peers will have in the result.
Selected reading:
Excellent post. Congrats
Thanks for an excellent summary. Best regards from a norwegian veterinarian