Some of these results are indeed very surprising. The "correct" result of (double(0.3) - double(0.2) - double(0.1)) is -1 * 2^-54 = -5.5511151231257827e-17, and dividing by three comes out to -1.8503717077085942e-17.
Hence, as far as those functions are concerned, the naive mean yielded the worst error in both tests, but stable_mean coincidentally happens to be the best in the first test. I'm slightly surprised that none of them got the correct result.
Heh, I reasoned about the exponent by hand and got -55 but then wasn't sure whether it was correct so I searched for discussions of 0.3 - 0.2 - 0.1 and took the value they quoted (2^-54). That being an off-by-one error does resolve my question.
Hence, as far as those functions are concerned, the naive mean yielded the worst error in both tests, but stable_mean coincidentally happens to be the best in the first test. I'm slightly surprised that none of them got the correct result.