Pack Z and 32-bit color together into a 64-bit integer, then do an atomic min (or max with reversed Z) to effectively do a Z-query and a write really, really fast.
Nanite writes out the ID of the primitive at that pixel rather than the color, but otherwise yeah that's the idea. After rasterization is done a separate pass uses that ID to fetch the vertex data again and reconstruct all of the material parameters, which can be freely written out without atomics since there's exactly one thread per pixel at that point.