Agreed, but it seems those benchmarks were made to promote wonderbuild. I'm not sure if it's still updated (it doesn't seem so), and it uses old versions for cmake and waf.
it would also be nice to compare other build system like ninja+meson.
I'd say Python's runtime only has a small fixed startup cost that prevent it from beating natively-compiled tools in no-op rebuilds (when no source file has changed). Some tools like tup or ninja reach insanely low times that are below the startup time of Python itself. But at such low times, it doesn't matter anymore whether it's 10ms or 100ms.