The thing that did for me is realizing that people on opposite sides of the United States can't play music together if it requires any rhythmic coordination, even with a true speed-of-light signal with no other sources of latency.
Interesting! Presumably last 12 measures if you're playing a 12-bar blues, etc.
Big downside is you're stuck playing to a metronome, which would be enough for me to skip it, but it depends on the kind of music you're playing.
I could imagine that if the music is rhythmically slow and vague and improvised, big latencies are OK, and actually might yield some pretty interesting creative results.
Another model I've thought about is to structure players in a rooted DAG, and players can hear only people upstream of them.
E.g., you could build an orchestra by having a conductor and section leaders in a room together (or at within very low latency of each other). Other players could hear the leaders and play along, and then an audience could hear everyone. You could also do something more complicated like build things out in linear or power-of-2 layers, where each layer can hear everything upstream of it, and therefore many players would get a partial sense of the orchestral effect.
This could work nicely for improvised music, too, with causality preserved.
How does that work for the one playing ahead of everyone else? He just doesn’t hear anything? Or he hears his own music from 1 second ago? Or worse, other people’s music from 1 second ago.
It is actually the way to go with client-server programs, such as Jamulus. People from distant locations try to chose/run a server closer to their geographical (or, more properly, with a correction on how the fiber runs) middlepoint.