> My gripe is not limited to DL/ML/AI scope either. When I try to compare my res...

> My gripe is not limited to DL/ML/AI scope either. When I try to compare my results with the papers I cite, I generally can't find the formulae or the detailed method to reproduce the numerical method the paper claims, and this leaves us at the dark.

I agree. So many don't even both including hyperparams, even when they publish the code. The Github Issues for their code are littered with questions asking about hyperparams.

> All I can do is saying "Paper23 cites these results, and we surpass them at this, we are even at that, and they are better at the other thing", which I'm not comfortable doing.

If you are achieving better results, on the same dataset, and you are not cheating in anyway, and others can reproduce your results, then I don't know what is wrong with saying you got a better result.

Issues somewhat arise when you are using better hardware with more parameters or larger batch sizes than the original authors could have attempted. I think this accounts for the results in many papers.