You can use NTP to get the devices' clocks synced up to much better than necessa...

error503 · on Feb 23, 2023

Kind of. The bigger problem you will have if you try this is that the audio is not clocked by the system clock, and the audio clock is almost always free-running (and even if it were derived from the system clock, NTP et al don't generally discipline the clock itself, just the OS's presentation of it). So in the case of a long running playback (or continuous, as in this case), you will drift out of sync over time, and it doesn't take that long to become noticeable. And at some point you'll either start dropping out due to either buffer underflow or buffer overflow. So you do still need to take care about this.

So to work well you do need to resync the audio to the local audio clock using a sample rate converter, or build some custom hardware that lets you sync the playback audio clocks somehow. Or if you want to be sloppy about it, keep close track and stuff or drop individual samples as you drift.

But yeah, this is all more or less 'solved'.

ryanianian · on Feb 24, 2023

Sonos has a remarkably good implementation of all of this.

For URL-based streams they buffer and NTP to sync. For live streams (e.g. gaming) they p2p multicast and tweak the wifi params in real-time to minimize drops.

The speakers create their own wifi and use MST network heuristics to latency-min route over that versus native wifi or ethernet if you've plugged it in. Sound drops when the wifi spectrum blinks (rarely), but I have never encountered the speakers being out of sync or noticing an echo effect.

And the speakers can use your phone's mic to scan the soundscape of a room to acoustically balance the sound when you set them up. I particularly like how consistent the sound volume is room-to-room even with very different speaker setups.

IIRC they've patented their specific mechanism. So ya, it's solved, but it may be expensive to license.

(Not affiliated with Sonos, I just have a bunch of them and like them a lot.)

error503 · on Feb 24, 2023

Yeah, Sonos is very much the Apple of this space. A solid, user-friendly implementation of several pre-existing concepts into a cohesive product - no small task. I don't think the technologically important parts of this are patentable though, there's both prior art and the obviousness standard to worry about. But very much like Apple's 'rounded corners' case, they've gone after (IMO) obvious UI functionality for such a system to extract money from their competitors.

If you are just interested in the synchronized Audio-over-Ethernet part, AES67 is the industry standard, and a pretty complete open-source implementation can be found at https://github.com/bondagit/aes67-linux-daemon , though AES67 is itself a composition of existing standards, fundamentally it is mostly composed of SDP for sessions description, RTP for media, and PTP for clock sync, so you can build that out of a variety of implementations too.

For room correction you can look at https://drc-fir.sourceforge.net/ to generate FIR filter coefficients, then you can apply it in realtime with https://github.com/wwmm/easyeffects or https://github.com/HEnquist/camilladsp .

Of course some people just want it to work, then you can shell out for Sonos :p.

rerdavies · on Feb 24, 2023

The patent actually covers a mechanism for electing a master controller for synching and storing configuration parameters. The actual process of synching audio is not covered. Not that difficult to work around the patent. But definitely easy to trip over the patent if you're not careful.

anyfoo · on Feb 24, 2023

True, it was definitely simplified. But yeah, in cases where you really care, there's a bunch of options to do it completely/sufficiently in sync. (A true asynchronous sample rate converter, as it would have to be here, might be a bit expensive, but simple interpolation, or even stuffing/dropping, might be sufficient for this particular use case.)

rerdavies · on Feb 24, 2023

Just re-sync at the start of each song. Sound propagating through air introduces ~ 1ms of latency per foot. So if tracks drift out of sync by a few milliseconds, it's no big deal.

error503 · on Feb 24, 2023

That is one solution, and in some scenarios it might not even be noticeable, but it's basically conceding the problem and accepting a guaranteed audio dropout at the end of every 'song', since for this to work you need some dead time to ensure all buffers are drained and start the new stream.

The simplest model is a source that generates a continuous audio stream, and a sink that plays it back; adding the idea of songs complicates the model, and in some use cases might be totally inappropriate. For elevator music, sure it likely doesn't matter, and maybe you can hide it in a crossfade or something with enough metadata, but this is probably part of a system where you put audio into one device connected to the network, that might include live stuff like PA announcements, and it comes out a bunch of other ones, not a dedicated elevator music system.

thfuran · on Feb 23, 2023

You just need to take a cue from wifi and use beam forming to send separately synced audio to each person.