Yeah. But the latency I had with snapcast was too much for video syncing. Maybe I had some issue other people don't have though. If you just want all audio sinks synchronized, then snapcast is great because latency doesn't matter all that much. But. If you want the words you're hearing to match moving of lips on screen.... then you're in a different realm entirely.
That makes sense - I believe by default snapcast has a 1000ms buffer.
I wonder if you can't grab the latency correction factor from snapcast in real-time, and somehow apply it to you video stream as well? This has been raised before it seems [1], and in another issue, the snapcast author recommends looking at RTP based streaming instead of snapcast [2].
Seems like snapcast may not be ideal for this after-all :)
I think parent says that +/-200ms is acceptable for snapcast, but unacceptable for lip-syncing.
I am not sure snapcast can't sync down to a few ms. If so, the issue would be syncing video with the audio clients, which certainly sounds feasible if integrated in the video player. That's what jellyfin does: https://github.com/jellyfin/jellyfin-web/pull/1011 (I tried to help a bit with that one).