As others have pointed out, 500ms is huge for some applications. But when the use case is selecting a song from a playlist, starting to hear the track 1 second later is totally fine. It's often preferable to a stream with <30ms latency that drops a buffer every once in a while.
I'd guess the latency comes from generous buffer sizes that keep playback rock solid even though this does TCP, potentially over somewhat flaky WiFi. Keeping separate speakers synched within 1ms is pretty good — you'll cause a greater variation than that just by moving your head to look at the phone with the playlist :)
Plus, that one second delay is present even if you use native solutions, like spotify play through amazon echo (very small delay, not that annoying) or sonos app (way more noticeable of a delay, closer to 1-2 seconds).
AirPlay uses a fixed latency of 2 seconds, so 500ms is not bad. Either way it's not a technical accomplishment/failure, the longer the latency the longer you can put up with your WiFi breaking without having your audio blip.
500ms should be enough that you can roam between multiple APs without the audio stream being interrupted. If you're on hard-wired ethernet you should just be able to modify the snapcast source code to reduce it to whatever you want (10ms should be fine).
Tbh, I feel like this was a genius move by Apple. AirPlay’s 2 second delay gives plenty of time for all devices to sync up and they simply build it in to the AV playback so that videos are perfectly synced to audio as well.
However, without that source-side adjustment: 500ms is a lot. Especially for video.
Yes, snapcast is really designed for audio streaming. For video it's not going to work (but surely for video cases you want something to stream both audio and video?).
Isn’t this a case where you delay the video on the TV side to match the latency of the audio? I assume your TV has an option for audio delay.
The problem that I’d see is that you probably don’t have this setup all the time, so switching your TV back and forth for an audio delay would be annoying. That and 500 ms might be more of a delay than your TV can handle.
The majority of receivers above a certain point have streaming radio, "cast", and/or "smart TV" features. Just like so much of the rest of the appliance world, most of those are powered by some random ARM or MIPS SoC running some flavor of Linux.
A modern networked AV receiver is sort of analogous to a managed switch, there's a fairly generic CPU running a general purpose OS presenting the user-facing interfaces, managing configuration, and handling some housekeeping tasks, then there's specialized hardware that actually deals with the signals and does the real heavy lifting after being set up by the general purpose processor..
ISTM they could just have a different mode for the system that lowers latency and increases droppage. For a sportscast the occasional pop or crackle wouldn't hurt anything.
If you're a fan of overly-complicated setups and looking for low latency, I had some success using JACK on a rpi. Be prepared for some tinkering, though.