The latest GCs are _fast_. I benchmarked ZGC and IIRC it ate up something like 4GB/s of garbage (24 threads doing nothing but allocating) with about 2 milliseconds of stop-the-world pause time over around 3 minutes of runtime.
Note that benchmarking GCs properly is really hard. Changes in size distribution, tree shape, and lifespan can lead to drastically different results (and to make matters worse, the type of code that is easiest to write as a benchmark tends to be types of code that GC can handle really well).