Excellent. Very many social science/psychology studies are like this and worse. One memorable study that stays top of mind because it was so awful was on CBT for depression. It concluded by saying CBT was effective and should be considered the gold standard. However the paper itself stated that half the participants achieved nil outcomes and the other half minor or indeterminate. So much utter garbage gets published even in top journals and then thoroughly pollutes clinical practice.
Blocks of size 5? Planned sample size of 50 but randomized 149? Yikes! Some gems in that "I bet nobody read this far" bulleted list! (In all seriousness - great post!)
Some other things that might be worth noting about this study - the authors report that there are no statistically significant differences at baseline. This is based on the p-values presented in Table 1 of the document. The authors state that they represented these values as mean+-SD, and the p-values are reportedly representing chi-squared tests or t-tests. However, rerunning the reported tests shows that the p-values are most (all?) incorrect. Notably, the one that sprang out to me was the reported p-value for the difference in stool frequency, where the FODMAP diet group had a 1 SD lower mean than the PD group - rerunning this test in Stata gives me a value of 0.0036. Even assuming rounding errors in favour of the authors, this is a statistically significant difference.
Meanwhile, all of the categorical p-values are wrong in this table. They report almost the exact same % of females in both groups, but a chi-squared of 0.25 (the correct value is 0.931 on chi-squared or 1 on Fisher's exact). Also worth noting that a couple of values in Table 1 don't pass the GRIM test, which calls into question whether the reported sample size is accurate for this table.
All in all, not the sort of study that I'd rely on as evidence!
this is an example of the now well-known "the difference between significant and non significant is itself not significant" (what i call the indirect significance fallacy).
(p.s. my current desk at work used to belong to the son of the hydroxichloroquin doctor guy :) )
Thank you so much for a great post with a great example of why this is an issue! Working for the Danish Medical Research Ethics Committees, we would surely have a question or two to researchers enrolling 3x patients without an amendment with a justification!!
Excellent. Very many social science/psychology studies are like this and worse. One memorable study that stays top of mind because it was so awful was on CBT for depression. It concluded by saying CBT was effective and should be considered the gold standard. However the paper itself stated that half the participants achieved nil outcomes and the other half minor or indeterminate. So much utter garbage gets published even in top journals and then thoroughly pollutes clinical practice.
Blocks of size 5? Planned sample size of 50 but randomized 149? Yikes! Some gems in that "I bet nobody read this far" bulleted list! (In all seriousness - great post!)
Great piece!
Some other things that might be worth noting about this study - the authors report that there are no statistically significant differences at baseline. This is based on the p-values presented in Table 1 of the document. The authors state that they represented these values as mean+-SD, and the p-values are reportedly representing chi-squared tests or t-tests. However, rerunning the reported tests shows that the p-values are most (all?) incorrect. Notably, the one that sprang out to me was the reported p-value for the difference in stool frequency, where the FODMAP diet group had a 1 SD lower mean than the PD group - rerunning this test in Stata gives me a value of 0.0036. Even assuming rounding errors in favour of the authors, this is a statistically significant difference.
Meanwhile, all of the categorical p-values are wrong in this table. They report almost the exact same % of females in both groups, but a chi-squared of 0.25 (the correct value is 0.931 on chi-squared or 1 on Fisher's exact). Also worth noting that a couple of values in Table 1 don't pass the GRIM test, which calls into question whether the reported sample size is accurate for this table.
All in all, not the sort of study that I'd rely on as evidence!
I'm shocked, shocked to hear this I tell you. But better you than me doing the "deep dive" 😜
Ahem! "there is zero chance anyone has read this far anyway".
🙏👊
Terrence and Phillip! (I did actually read the whole article and it's great, just enjoyed the "obscure Canadian celebrity" reference in the video.)
this is an example of the now well-known "the difference between significant and non significant is itself not significant" (what i call the indirect significance fallacy).
(p.s. my current desk at work used to belong to the son of the hydroxichloroquin doctor guy :) )
Thank you so much for a great post with a great example of why this is an issue! Working for the Danish Medical Research Ethics Committees, we would surely have a question or two to researchers enrolling 3x patients without an amendment with a justification!!
It's very odd!