"Wrong" isn't the word we're looking for here, I don't think. But your above example is bullshit -- nobody puts 1000 patients at risk in a phase I (safety) trial, and if the dose isn't reasonably well calibrated by the phase III study you're describing above, someone's going to jail. In Phase II we will often have stopping rules for exactly this reason, just in case the sampling was biased in the small Phase I sample.
Above there are a number of things to notice:
1) The phasing approximates Thompson sampling to a degree, in that large late-phase trials MUST follow smaller early phase trials. Nobody is going to waste patients on SuperMab (look it up).
2) The endpoints are hard, fast, and pre-specified:
IFF we have N adverse events in M patients, we shut down the trial for toxicity.
IFF we have X or more complete responses in Y patients, we shut down the trial because it would be unethical to deprive the control arm.
IFF we have Z or fewer responses in the treatment arm, given our ultimate accrual goal (total sample size), it will be impossible to conclude (using the test we have selected and preregistered) that the new drug isn't WORSE than the standard, so we'll shut it down for futility. Those patients will be better served by another trial.
You are massively oversimplifying a well-understood problem. Decision theory is a thing, and it's been a thing for 100 years. Instead of lighting your strawman on fire, how about reframing it?
Stopping isn't "always" wrong, but stopping because you've managed to hit some extremal value is pretty much always biased. The "Winner's curse", regression to the mean, all of these things happen because people forget about sampling variability. It's also why point estimates (even test statistics) rather than posterior distributions are misleading. If you're going to stop at an uncertain time or for unspecified reasons, you need to include the "slop" in your estimates.
"We estimate that the new page is 2x (95% CI, 1.0001x-10x) more likely to result in a conversion"... hey, you stopped early and at least you're being honest about it... but if we leave out the uncertainty then it's just misleading.
All of the above is taken into account when designing trials because not only do we not like killing people, we don't like going to jail for stupid avoidable mistakes.
My point is that you can still extract useful information when your stop is dynamic rather than static. One typical scenario is when your effect size ends up being larger than you originally guessed. There's little reason to continue if the difference becomes obvious.
In the future, I would appreciate it if you steelmanned my comments or asked for clarification instead of insulting me. It hurt my feelings. I wish I had written a better comment that hadn't incited such a reaction from you. Best wishes.
You are right, I shot from the hip. Sorry about that.
I also have noprocrast set in my profile so I couldn't go back and edit it (something I thought about doing). I probably would have toned it down if I hadn't requested that Hacker News kick me off after 15 minutes.
Your line of discussion is productive. It's just important that people understand the difference between degrees of belief and degrees of evidence from a specific study and never confuse the two. Trouble is, lots of folks confuse them, and lots of other folks prey on that confusion.
No worries and thanks for the apology. I apologize for my own comments in this thread, which were lower than the quality I aspire to. I had pulled an all-nighter for work and was sitting grumpily with my phone at an airport.
Above there are a number of things to notice:
1) The phasing approximates Thompson sampling to a degree, in that large late-phase trials MUST follow smaller early phase trials. Nobody is going to waste patients on SuperMab (look it up).
2) The endpoints are hard, fast, and pre-specified:
IFF we have N adverse events in M patients, we shut down the trial for toxicity.
IFF we have X or more complete responses in Y patients, we shut down the trial because it would be unethical to deprive the control arm.
IFF we have Z or fewer responses in the treatment arm, given our ultimate accrual goal (total sample size), it will be impossible to conclude (using the test we have selected and preregistered) that the new drug isn't WORSE than the standard, so we'll shut it down for futility. Those patients will be better served by another trial.
You are massively oversimplifying a well-understood problem. Decision theory is a thing, and it's been a thing for 100 years. Instead of lighting your strawman on fire, how about reframing it?
Stopping isn't "always" wrong, but stopping because you've managed to hit some extremal value is pretty much always biased. The "Winner's curse", regression to the mean, all of these things happen because people forget about sampling variability. It's also why point estimates (even test statistics) rather than posterior distributions are misleading. If you're going to stop at an uncertain time or for unspecified reasons, you need to include the "slop" in your estimates.
"We estimate that the new page is 2x (95% CI, 1.0001x-10x) more likely to result in a conversion"... hey, you stopped early and at least you're being honest about it... but if we leave out the uncertainty then it's just misleading.
All of the above is taken into account when designing trials because not only do we not like killing people, we don't like going to jail for stupid avoidable mistakes.