I'm a bit out of the GPU game but so this might be slightly wrong in some places: the issue is in small triangles because you end up paying a huge cost. GPUs ALWAYS shade in 2x2 blocks of pixels, not 1x1 pixels.
So if you have a very small triangle (small as in how many pixels on the screen it covers) that covers 1 pixel you will still pay the price of a 2x2 block (4 pixels instead of 1), so you just wasted 300% of your performance.
Nanite auto-picks the best triangle to minimize this and probably many more perf metrics that I have no idea about.
So even if you do it in software the point is that if you can get rid of that 2x2 block penalty as much as possible you could be faster than GPU doing 2x2 blocks in hardware since pixel shaders can be very expensive.
This issue gets worse the larger the rendering resolution is.
Nanite then picks larger triangles instead of those tiny 1-pixel ones since those are too small to give any visual fidelity anyway.
Nanite is also not used for large triangles since those are more efficient to do in hardware.
> So even if you do it in software the point is that if you can get rid of that 2x2 block penalty as much as possible you could be faster than GPU doing 2x2 blocks in hardware since pixel shaders can be very expensive.
Of course the obvious problem with that is if you don't have most of the screen covered in such small triangles then you're paying a large cost for nanite vs traditional means.
Nanite has an heuristic to decide between pixel-sized compute shader rasterizing and fixed-function rasterizing. You can have screen-sized quads in Nanite and it's fine
So if you have a very small triangle (small as in how many pixels on the screen it covers) that covers 1 pixel you will still pay the price of a 2x2 block (4 pixels instead of 1), so you just wasted 300% of your performance.
Nanite auto-picks the best triangle to minimize this and probably many more perf metrics that I have no idea about.
So even if you do it in software the point is that if you can get rid of that 2x2 block penalty as much as possible you could be faster than GPU doing 2x2 blocks in hardware since pixel shaders can be very expensive.
This issue gets worse the larger the rendering resolution is.
Nanite then picks larger triangles instead of those tiny 1-pixel ones since those are too small to give any visual fidelity anyway.
Nanite is also not used for large triangles since those are more efficient to do in hardware.