80% difference in a microbenchmark is not nothing, but it's hardly unusual. In a real application, these kind of tiny differences may well be much less dramatic, especially if you consider that most apps will be tuned to the OS they're designed for and thus pick the "happy path" for that OS.
And that 80% difference is in the buffered case, but that's also the least relevant - you can use user space buffering (which is normal anyhow, on both OS's) to amortize the system call costs.