Isn't snapcast the de-facto solution for that kind of use-case?

radus · on Feb 16, 2021

This would be my pick as well. I've researched this in the context of a silent disco, and tried an Airplay based solution described here: https://chrislivengood.net/the-do-it-yourself-silent-disco/, but snapcast just works.

earthscienceman · on Feb 16, 2021

Yeah. But the latency I had with snapcast was too much for video syncing. Maybe I had some issue other people don't have though. If you just want all audio sinks synchronized, then snapcast is great because latency doesn't matter all that much. But. If you want the words you're hearing to match moving of lips on screen.... then you're in a different realm entirely.

radus · on Feb 16, 2021

That makes sense - I believe by default snapcast has a 1000ms buffer.

I wonder if you can't grab the latency correction factor from snapcast in real-time, and somehow apply it to you video stream as well? This has been raised before it seems [1], and in another issue, the snapcast author recommends looking at RTP based streaming instead of snapcast [2].

Seems like snapcast may not be ideal for this after-all :)

[1] https://github.com/badaix/snapcast/issues/57

[2] https://github.com/badaix/snapcast/issues/731#issuecomment-7...

MayeulC · on Feb 17, 2021

I think parent says that +/-200ms is acceptable for snapcast, but unacceptable for lip-syncing.

I am not sure snapcast can't sync down to a few ms. If so, the issue would be syncing video with the audio clients, which certainly sounds feasible if integrated in the video player. That's what jellyfin does: https://github.com/jellyfin/jellyfin-web/pull/1011 (I tried to help a bit with that one).