I know nearly nothing about BDNF specifically. Whether it should motivate a follow-up is mostly only something known to the authors, as a p = .046 suggests a chance they may have tested numerous outcome variables and reported only one (e.g,. this could very well be 1/10). The fact that the p-value is almost comically close to p = .05, makes me suspect that this happened. Perhaps, if this goes in line with other BDNF research, then that could motivate it some further work.
Notably, even if we take this p = .046 as a given, and assume there was no p-hacking, then this type of result implies that statistical power is tiny, and a proper "bigger population" study would likely have to be several hundreds of people. Even a study with 50% power, should have a majority of significant results land p < .01.
Agree that this is definitely an assumption one needs to make, could easily be that BDNF was one variable among many unreported ones, and this case would be consistent with the other outcome variables in the paper so seems plausible.
> this type of result implies that statistical power is tiny,
Yes, definitely, BUT the effect in question is an interaction effect so yeah, power's just going to be small from the nature of the design. I was definitely thinking that you'd be looking at a follow up study of the size of multiple hundreds to confirm something like this. I'm realizing that thinking this is a trivial follow-up is is the difference between someone actually might work on real experiments and someone who just works with the numbers.
Just want to re-emphasize though that the thing which makes me give this result (some) credence (assuming it's not a desk drawer p-hack) is just the distributions of the observation variable for the two treatment groups. Like even if the means of the BDNF increase are equal between the two arms of the trial, and this p-value is a false pos (which as you say, seems very possible), there's still clearly some other differences between the groups. I strongly suspect a quantile regression on the p50 or p75, rather than an ANOVA on the means, would show a 'more significant' effect; heck even just a log-linear model or something seems like it would be an improvement since there's clearly some skew in the 'Dance' population.
Notably, even if we take this p = .046 as a given, and assume there was no p-hacking, then this type of result implies that statistical power is tiny, and a proper "bigger population" study would likely have to be several hundreds of people. Even a study with 50% power, should have a majority of significant results land p < .01.