Hacker News new | past | comments | ask | show | jobs | submit login
The Low Latency Live Streaming Landscape in 2019 (mux.com)
134 points by mmcclure on Feb 6, 2019 | hide | past | favorite | 56 comments



Interesting example: there's currently a major squabble in the British gambling industry over the use of drones to provide low-latency streams of horse racing. Off-track betting is legal here, as is in-play betting - you can place a bet right until the winning horse crosses the line.

The latency for a typical satellite broadcast is about 10 seconds, so several gambling syndicates operate their own drones to provide a private low-latency stream, thereby gaining a huge advantage on in-play betting. The drones are being flown legally in uncontrolled airspace, so there racetracks are at a loss as to how to respond.

https://www.theguardian.com/sport/blog/2019/jan/16/talking-h...

https://www.racingpost.com/news/open-skies-authority-says-no...


How do they prevent the people at the actual races from sending out information and skipping the whole drone complication?


They can't. Having someone in the grandstands with a mobile phone used to be the norm, but a drone provides more information with less latency. As in financial trading, there's a huge financial advantage to knowing more than your rivals or knowing it faster. The time it takes to say "two and five have fallen" might be the difference between profit and loss on a race.


I knew nothing about horse racing so bear with me:

You can still bet when the game has already started or what? Because otherwise, I don't quite understand why lattery would matter in this case.


You can bet until the last second of the race


Chunked video over Websockets (or chunked transfer) and scaled out WebRTC are just hacks. Browsers need to support proper low latency video streaming. There's a reason Skype and Zoom use desktop clients. If you are wondering, WSAM is not the answer, you need proper access to the hardware decoders, especially on mobile devices.

We had pretty good low latency video a decade ago with Flash but of course with HTML5 you don't need Flash </sarcasm>


Hangouts or Google Meet are notoriously crappy compared to Skype.

Appear.in is also WebRTC based and seems to work better, but screen sharing is not as good as Skype either.


Author here, there's lots of interesting innovations happening in this space. If anyone knows of anything interesting I've missed, please let me know!


Nice article. High frequency trading is probably one of the most advanced fields in low latency live streaming due to the significant impact latency, which in some cases is measured in nanoseconds, has on profits.

One standard in HFT is the FAST Protocol[1], an adaptation of the FIX Protocol. One of FIX's newer developments is Simple Binary Encoding[2], which is designed for minimal latency. There's many differences between FIX and "normal" protocols that are really interesting when you realize how much thought went into shaving off as much latency as possible with FIX. Unfortunately, FIX is one of the few solutions in HFT that is publicly available as many firms make money from being the fastest in a particular area.

I doubt many HFT developments could be adapted for use in video streaming, as bandwidth is rarely an issue and hardware can cost much more. However, I would not be surprised if at some point in the future an HFT firm creates a general solution applicable for video.

[1] https://www.fixtrading.org/standards/fast/

[2] https://en.wikipedia.org/wiki/Financial_Information_eXchange...

Edit: Grammar


I've always found the HFT space interesting, but don't really have any desire to move into the financial sector.

How large are the payloads in general for HFT?


It depends on the application/ use case, but typically they are small. For a strategy my team is currently developing, each message contains 2-4 dynamic data points. Messages also include a minimal header[1] and "trailer" (checksum). The header includes data about body length, message type, etc. Messages in my use case are about 64 bytes, but my use case is on the lesser end in terms of payload size. I'd say a typical payload is ~64 to 128 bytes.

It is probably important to mention that some firms communicate with exchanges/ brokers via a direct connection rather than the internet.

[1] https://btobits.com/fixopaedia/fixdic44/block_Standard_Messa...


Nice summary! You touched on it briefly with real(ish) time uses for comms, but there’s another segment below this with even tighter latency constraints - in room signal transport. This is a world that’s migrated significantly towards to streaming based distribution over the past ~5 years.

Gear in this space has to hit sub-frame latency (best in market at the moment is around 0.02 milliseconds), maintain extremely high signal quality (4K60 4:4:4 or 4:2:0) and provide perfect sync across all decode endpoints.

Due to the these constraints it’s a world that lives exclusively in the dedicated hardware space. Some main players that are worth checking out are SDVoE and their associated implementers, Crestron NVX, SVSI, Atlona Omnistream and Lightware UBEX. Dante have also just released their new board that handles all the sync and transmission but let’s inplementers pick there codec of choice too which should be interesting.

Most gear shipping at the moment uses either JPEG2000 or VC-2 based codecs or hand wavey “proprietary” ones that chip vendors refuse to give info on. There some interesting work being done in the form of the JPEG XS codec too. It provides low latency, but also super low quality loss over multiple encode / decode steps too.

Given some of the engineering constraints, it’s a super interesting area to watch.


Kieran Kunhya gave a good talk on related things at Demuxed https://www.youtube.com/watch?v=z1R3QlaaaUA


One (emerging) area is ABR video delivered over a WebSocket. Throughput estimation can be done using server-side TCP statistics that are updated at each ACK segment (the tcp_info structure includes a delivery_rate member that is pretty helpful), which is more reliable than using client-side info that comes from the reconstructed reliable byte stream.

The MPEG-DASH part 6 spec starts going in this direction, and we tried to run with these ideas in Puffer (https://puffer.stanford.edu). Empirically you can get quick channel changes and better first-chunk quality with this approach; we haven't tried to minimize end-to-end latency below standard levels. And there's probably nothing fundamental that you can do with a WebSocket that you can't do with a sufficiently smart chunked HTTP response.

For really low latency, you want to couple the video codec parameters with the transport's capacity estimate in a way that the WebRTC.org/Chromium codebase is not really capable of doing. See https://snr.stanford.edu/salsify .


What kind of performance did you get client side? A problem I see in the web space is that fastest encoder you can get is locked inside the browser so you need something in JS or WebAssembly.


Not sure I quite follow -- the video goes through the same pipeline that it would with conventional DASH (MSE to a video element to whatever decoder the browser provides), and the performance is basically the same. You're welcome to try it! https://puffer.stanford.edu


Thank you for the article. As I have used Nanocosmos H5Live Server in production for over a year I can provide some insight into their product.

It works by repackaging an RTMP stream into GOP chunks and pushes them via Websockets to the browser where they're piped to a video player using MSE (Media Source Extension).

On iPhones, where there is (was) no MSE available it used something like very short, chunked HLS segments. Not sure I fully understood that part but somehow they tricked the m3u8 playlist into loading short segments very very often.

The beauty of it was that it was a drop-in plugin for your existing RTMP-based infrastructure. You just add an instance of their server software and you would use their JS player to your pull your RTMP stream, repackaged on-the-fly.

    let player = new NanoPlayer({
        server: 'wss://nano_server_url', 
        rtmp:   'rtmp://rtmp_server_url/app/stream.mp4'
    });


Is there anything more you can talk, write, or link to about "cdn" support for sub-200ms streaming. Like is there a "varnish for WebRTC" or something like it (really more of an ircd for video)? Such that someone could build out a live streaming platform that can be conversationally interactive and yet not require a 1:1 back to origin or mesh connection setup.


As far as I'm aware there aren't any public CDNs that support the sub-200ms approaches which don't involve buying a full solution from that vendor.

* Limelight's solution is only really accessible through their low latency streaming products provided by Red5.

* Fastly and Akamai are both chasing Ultra low latency through chunked CMAF delivery.

On a slightly smaller scale, Cloudflare supports websockets, so you could use a protocol like the one Wowza are using (WOWZ). https://blog.cloudflare.com/cloudflare-now-supports-websocke...

If you're interested in a toolkit to build out a webRTC style CDN edge, I think the best place to start would probably be with Gstreamer's new WebRTC tooling https://opensource.com/article/19/1/gstreamer


IIRC BitGravity had a focus on the streaming video CDN space when I looked a few years ago. Not clear if they have any special sauce in this area.


Pushpin could be considered a “varnish for streaming”, and it doesn’t require 1:1 connections with the origin. This may get you close to low latency multimedia streaming with the right tuning.

There’s a live music example here: http://audiostream.fanoutapp.com (code: https://github.com/fanout/audiostream). It plays a song in a loop using GStreamer.


Great article - just a small issue:

It's standard to refer to esports as "esports" or "Esports" rather than "eSports".

(see: https://www.dexerto.com/news/esports-esports-associated-pres...)


Integrating codec and network stack permits higher quality and lower latency. Eg, https://snr.stanford.edu/salsify/


Game streaming (running a game on a remote server) is notable for requiring low latency. And VR will be even worse (<20 ms).


I'm really surprised to see no mention of FTL or Mixer!


Would any of the experts from this thread be able to help out with on a fun project with low-latency video?

I'm trying to get video from a raspi + webcam attached to a drone car that's controlled from a web app. Eventually, I'd like to use it to enable visitors from around the world drive around in my apartment and maybe play treasure hunt/escape room kind of games.

In order for internet-folk to operate the drone without frustration I need the lowest latency video streaming I can get. Currently using an mJPEG stream but it has no audio and doesn't take advantage of the on-board h.264 encoding of the webcam (logitech c920). Car operation and the web app is already done.

lowlatencystreaming@protonmail.com


You probably need to change the problem. Instead of doing real time control of the steering wheel and speed you probably need to let people click where they want to go, and then let them watch the car drive there before the make their next click. Switch to a turn-based game feel and use the driving time and the decision making time to hide the latency.


You might have some luck finding someone at a video meetup if there's one near you (https://demuxed.com/meetups). I bet an organizer would be open to you giving a quick lightning talk about the project to their group (if you're in SF feel free to ping me).

There's also the Video Dev slack: https://video-dev.org.


Not really the subject here, but I don't understand why video CDN providers don't provide peer to peer solutions yet, such as Streamroot or Peer5


I know the ISPs are against it. To the point that Netflix essientially threatened to switch to a p2p model if the Comcast peering fiasco wasn't solved amicably.

https://arstechnica.com/information-technology/2014/04/netfl...


Why are the ISPs against this?


Latency is likely higher/less controlled with a p2p solution and mobile devices can't easily be p2p nodes unless they're on wifi.


Peer5 is actually a really interesting example of how the user experience can be improved using peer-to-peer, particularly in areas with less capable internet infrastructures - their technology _only_ kicks in if the user experience can be improved by using peer-to-peer, and it'll fallback to traditional CDNs if they can't improve the experience.

Despite this logic, they were capable of very impressive P2P offloads during the world cup. Their datasheet linked at the bottom of this blog post is worth a read for more details: https://blog.peer5.com/what-a-world-cup/ (email address needed)


>their technology _only_ kicks in if the user experience can be improved by using peer-to-peer

I'm curious on how well that works in practice.

I'd imagine there's more failures modes versus a regular CDN even after the initial decision of P2P or CDN. For example, you're downloading from another user who suddenly closes their app or whose network connection degrades. You can handle that but you'll need a larger buffer, on average, which increases latency.


My gut instinct is there's both technical and business reasons.

From a technical perspective, traditional CDNs are struggling to grasp what it means to be a CDN in the peer-to-peer space, their networks have been built to support traditional HTTP traffic, and they've been fairly successful delivering that. Adapting to cacheable peer-to-peer objects would mean large architectural changes.

From a business perspective I suspect they also see a challenge where encouraging people to chase peer-to-peer oriented solutions may increase adoption, and start to reduce the amount of traffic they're serving from their private edge as more viewers start to peer between each other.


P2P is now slower and less reliable than conventional CDNs. This is a combination of bandwidth becoming cheaper, CDNs getting better (more locations, more peering, etc.), reliable always-on desktop PCs going away, etc.


I disagree. I've seen P2P outperform traditional CDNs many times, take a look at some of the data from Peer5 I linked below around improving the user experience.

P2P doesn't need "always on" boxes in large scale live stream scenarios, which is one of the use cases where it works best. I think we'll see a lot of growth in hybrid P2P/traditional CDN in the next couple of years.


The big problem with P2P is it doubles the bandwidth cost for the end user. Not a problem if we're talking desktops in countries that aren't 3rd world (or Australia). But most mobile plans still have strict bandwidth limits. If you have 10GB of data a month, do you want to use 1GB to stream a live event for a few hours, or 2GB?


Udp websockets would solve this problem pretty quick, but we're stuck with webrtc which has few open source servers and none that perform as well as they should. It's effectively proprietary, services like discord build their own server software to handle webrtc at scale, and good luck building that yourself.


Last night I was watching the State of the Union via C-SPAN's YouTube stream. My husband walked into the living room, having just driven back from an errand, and jokingly recited the next several sentences along with Trump.

He had been listening to the local public radio station in the car, and by comparing the TV to the stereo's FM receiver I was surprised to find that the youtube stream was about a full minute behind the radio. I had expected that YouTube would lag behind traditional media that tend to be using things like dedicated satellite relay capacity to spread live events to broadcasters, but I was very surprised at how large the difference was - one that seems to notice in today's world of people commenting on public events on real-time media like Twitter.

Indeed, watching Twitter I could see responses to parts of the speech that I hadn't seen yet. Must really be interesting for sporting events where a social media post about play could "spoil" the game.


Trying to watch a football (soccer) game on DirecTV, while your neighbors are watching it on cable or open air sucks. You hear the "goooal!" screams at least a full 20 seconds before they are scored :)


That's interesting. For the London 2012 Olympic Games, YT had the rights to live stream in tens of countries in Asia and Africa, basically everywhere where the Internet rights hadn't been bought by broadcasters. For latency and reliability, they ran new fiber from 30 Rock (NBC) to a Google POP. Unrelated, Google Fiber also got a license for antennas in the Iowa datacenter: https://www.google.com/about/datacenters/gallery/#/places/1

In this case, though, it's not YouTube pulling streams from the source, it's C-SPAN pushing them. I wonder what kind of setup is in place there.


It would have been interesting to compare the different live streams for latency since at least a half dozen news organizations were putting SOTU on YouTube - an experiment for next time.


For a double-livestream twist: I watched the Super Bowl on the CBS website, and during the second half muted that stream to listen to the Chapo Trap House commentary, which was streaming on Twitch. Even after whatever small delay is caused by Twitch (the article mentions Twitch is quite low latency compared to Amazon, at least), they were mentioning stuff about 40 seconds before I saw it.


> Must really be interesting for sporting events where a social media post about play could "spoil" the game.

It gets even better: Every WorldCup game I've watched in the last 2 decades served as a survey for who had the lowest latency in the neighbourhood (because the neighbours cheers would arrive quicker than the image of the actual goal).


yes, the lag is annoying both ways - from radio to streaming app where 30 seconds are repeated - or vice versa, where you have a 30 second gap.

but nba game streams seem to be about 15 seconds behind, which is the perfect amount of time to be able to see replays by just looking down at my ipad (that's streaming the same game as the tv). =)


ProTIP: multicast

Been working like magic since the dawn of times. One downside, if you are not an ISP - bad luck...


Great point!

The BBC actually did some really interesting work last year on using MPEG-DASH with Multicast with a demo at IBC, the research can be found here: https://www.bbc.co.uk/rd/projects/dynamic-adaptive-streaming...


Nice! I had been wondering if the move to HTTP-3 over UDP would open up possibilities for media over a multicast HTTP transport. Looks like somebody's already doing just that.

Only downside is that multicast sucks over wifi when many devices (like in a corporate env) are all trying to view a multicast stream. The time slice allotted to multicast is way too small to handle it. I really wish there was a spec created for wifi simplex connections where a channel could be reserved for broadcast with one signal to be consumed by many receiver devices.


For real time communication— I’m not sure why people find high latency to be so difficult. You just pretend everybody is taking a moment to think. Virtual “talking sticks”, or radio etiquette is not hard to add either.

What is difficult is audio quality issues. I wish voice and video software had a way to prioritize quality at the expense of latency (i.e. never compromise on delivering quality voice, but pause input or fast forward through quiet segments as needed to stay roughly in synch)


For what audience?

A HAM radio operator is very aware of the technical process going on, the proper etiquette, etc. It's pretty easy to adapt in that situation.

A random Twitch viewer has likely never thought about broadcast latency before, and has no real reason to. It's just an unintuitive experience when chat reacts to things that happened many seconds ago.

Voice chat is probably the worst because the network and software is notoriously unreliable. Is it latency, is something broken, did they just not hear me? Are we talking over each other, something that can easily happen in casual conversation vs. a radio transmission? That mental overhead is constantly there. Imagine trying to explain to your mother that she should end all her statements with "over" and confirm whenever she hears something. Are you really not sure why people find this difficult, or are you just so proud of your own technical knowledge that you've lost any kind of reasonable perspective on the issue?

Lower latency is an unambiguous improvement in every regard; of course people care about it.


How on earth did this turn into an ad hominem about my technical pride? Because I’m ok with slowing down the pace of a conversation?


No, but because you chose to open with:

>I’m not sure why people find high latency to be so difficult.

Phrasing it that way makes it likely you will be interpreted as believing you're effortlessly better than everyone else at the subject at hand.


This seems to be an unpopular opinion, but I generally agree and see latency (within reason) as a UX problem more often than not.

A lot of the folks I talk to that think they desperately need sub 5 second latency would most likely be completely fine with ~15 seconds with a few minor UX/UI considerations. Since true low latency is fraught with scaling issues or is obscenely expensive, I think that leads to a lot of folks never actually getting ideas off the ground when they could have just hacked something together (cheaply) using off the shelf technology with "normal" latency in a few hours.

That being said there definitely are use cases for low latency and a lot of products/projects would benefit from it, but I don't think it's the hard, upfront requirement a lot of folks see it as.

Edit: For the sake of disclosure, I'm one of the Mux founders and we're currently working on low latency, but haven't released anything yet.


> This seems to be an unpopular opinion, but I generally agree and see latency (within reason) as a UX problem more often than not.

It is a UX problem. One that can be solved by lowering the latency.

Many examples of latency being a problem have been mentioned in the comments here, not to mention the article: sporting events ruined because your neighbor cheered too early, reading tweets about the State of the Union before it appears on your feed, etc etc.

How do you improve this without actually decreasing latency?


Sweep the problem under the rug. Delay everything by the highest latency. Let everyone suffer equally! (this is what bluetooth audio does)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: