So a suspicious player could detect the loaded dice simply by weighing them? That actually makes a lot of sense. If you look into actual dice/coin manufacturing they do not run tests like "roll a sample of dice a million times each, then check p-value". Instead they have a clever manufacturing process with stringent tolerances. If they run such tests at all it would be to find the upper bound on deviation from fairness over the lifetime of the dice (eg 100k rolls).