Cool! Some thoughts from a former Twitch engineer:
- Probably the hardest part of running these things is managing outbound bandwidth costs. You'll need to either limit inbound bitrate or transcode video down to a manageable rate, or else you'll quickly spend a lot of money on shipping 4k video for people.
- Right now, your nginx hosts both do ingest and playback, if I understand it right. You might want to separate the two. It makes maintenance easier, and it lets you scale much better - right now, a single stream probably maxes out on viewership based on the CPU capacity of the single nginx host that is ingesting the stream, transcoding it, and delivering it. If you have multiple nginx hosts which could deliver the already-transcoded stream, you could scale much better.
- Please don't keep using RTMP. RTMP is so stateful that it's pretty hard to manage, it doesn't have a spec, it doesn't have implementations on many devices, and its security is, uh, _weak_. Big players are forced to keep using it because telling their broadcasters to change is hard, but you don't have that problem. You might consider accepting an alternative modern protocol.
- You'll almost certainly need admin and moderation tools soon. Expect lots of pirate streams, as well as some horrific content. You can't run a live streaming platform without admin tools.
- Beware DDoS attacks. This setup looks very, very, very easy to take down, as-is...
Having worked with AWS IVS, which is using the same infrastructure design as Twitch (or the other way around), it only accepts RTMP (well RTMPS to be fair) as an input.
To me it would seem that the industry is going the opposite way, and doubling down on RTMP.
An RTMP connector is planned on the Chime SDK as well (a framework & infrastructure based on Chime to create your own custom meetings), to be able to stream the meeting directly to ingest service (like IVS/Elemental).
On the AWS side at least, there doesn't seem to be a plan to migrate from RTMP, as it seems that services are being launched that only supports it.
On a sidenote, the official AWS recommended way to stream a Chime SDK meeting to IVS, is to use this[0] docker container, acting as a bridge between WebRTC & RTMP. I find the hackiness of it amazing, as it's mostly just an X11 framebuffer, an instance of Firefox loading the webview for the meeting, and ffmpeg capturing the framebuffer and sending it as an RTMP stream to the configured endpoint.
Also, AWS IVS as a streaming platform is really impressive. It's a fully managed service, and out of the box you get around 4-5s of latency between the source and the player, with an upper limit of 1080p60 @8.5Mbps. It will also downsample the stream to 720p, 480p, 360p & 160p.
In a similar vein, Jitsi's video recording solution is pretty much the same thing[0].
>It works by launching a Chrome instance rendered in a virtual framebuffer and capturing and encoding the output with ffmpeg. It is intended to be run on a separate machine (or a VM), with no other applications using the display or audio devices. Only one recording at a time is supported on a single jibri.
We are severely lacking stand-alone webRTC implementations - the best implementation is in the Chrome codebase.
> out of the box you get around 4-5s of latency between the source and the player
In the world cup in 2018 I was watching an RTP feed direct from the IBC in Moscow on my desktop in VLC. The goal went in, and strangely I recognised it was a goal for England (I don't really do football). Window was open, and I cheered.
A few seconds later the neighbours cheered - they were watching it on TV.
I remember watching a 4K FA Cup final too in parallel to an off air TV feed. Watched the goal go in on the TV. By the time it had gone in on the 4K feed I'd actually forgotten it was a goal, so I guess that wasn't too bad.
That's a problem with streaming. If you're watching a popular live event like a big football match, and your neighbours are too, you need to be receiving it at the same time to avoid "spoilers".
How do you handle hevc/h265 and resolutions above 4K?
It’s not in the rtmp spec, so not supported by either ffmpeg or gstreamer. But chinese companies ship cameras that hack it and run rtmp for it anyway, leading users to peskering me about our sw being broken :/
For live streaming distribution to loads of viewers, outside of a video meeting scenario, then yeah. That is considered good. Plenty of commercial services with latency 5-10 times that.
It's quite amusing when I see 2-3 seconds being declared "ultra low latency" on the distribution side
In broadcast contribution, where you have someone in the studio talking to a person on the screen, anything over a second brings complaints. Typically aim for under 500ms of processing delay for low bitrate contributions, and at 25fps with a bog standard blackmagic card going SDI-IP-SDI, and add in timing, you're looking at 500ms of your budget eaten by the hardware framebuffer. OBE do a better capture card - one which allows access to the data on a line by line basis. If you go for something more hardware based and say a J2K codec you can get your latency down to a couple of frames.
I had a problem a few years ago with lives from Kiev -- the ISP we had kept dropping our packets for 125ms at a time. Didn't matter if we sent 20mbit (so 2000 packets per second) or 2mbit (200pps), the number lost in a row matched the 125ms outage.
Network people laugh when I complain about 125ms outages on the internet, but it meant that standard FEC wouldn't work (maximum of 20 burst packets, even if it recovered every lost packet that would mean errors above a transport stream rate fo 1.6mbit.
Now you can use RIST to dial in resends, but with a 100ms rtt you're looking at needing 300-500ms of buffer to cope with that type of outage (to realise the packets are missing and not just delayed (say 50ms), to ask for the retransmit (50ms), and to get them (50ms, then smear them over time as you can't do an instant retransmit)
Alternatively you can transmit twice and offset, but that still adds 150ms of delay.
To clarify, the 4-5s delay I'm talking about is for IVS, which is a live streaming solution, and doesn't have anything to do with video chat.
It has a RTMP input on which you send a video stream, re-encodes it and then distribute it as an HLS stream.
And getting sub 5s latency on an HLS stream is pretty fast.
I brought up Chime and video chat mostly as an example on how AWS is pushing for RTMP to connect those services.
FWIW, the OP repo uses a Terraform config to deploy the server on a 4GBRAM server on DigitalOcean.[1] Which costs $20/mon and has 4TB/mon bandwidth allowance.[2]
To give a sense of what a 4TB bandwidth could cover: 320 people watching a 3hr stream at 1080p or 70mb per minute.
Maybe SRT, but it doesn't have much platform support itself, and its spec is pretty strange. I get weird Adobe flashbacks about it, since it seems so heavily pushed by Haivision, and it's really made for a different use case - it's made for big broadcasters like ESPN and TV networks, not for web startups.
The current motion seems to be towards hacking WebRTC to accommodate broadcast ingest. But WebRTC is really huge and complicated and you only want a tiny sliver of it for this use case. Pretty much all implementations are buggy because its such a gigantic kitchen-sink design.
There's also RIST (for those who love working groups) and Zixi (for those who love closed source protocols?).
I honestly think that something like HLS could do pretty well (broadcaster slices video into 1-second TS segments and just ships em with HTTP POST), but it's not something I ever saw anyone do. Definitely doesn't meet the TV networks' needs, but its really easy to scale and straightforward to secure, which is maybe more important.
WebRTC really isn't that big. Google's implementation just is targeted at their needs. If you are doing it for ingest you just need
* Basic SDP library, ICE-Lite, DTLS, SRTP
If you ship RIST you pretty much end up with the same thing!
If you go with WebRTC you have implementations in lots of languages as well (not just bindings)
* Go, Python, Pure C/Embedded, Java, Rust, C++
Also WebRTC could let you do SVC or Simulcast. It would be a lot better for scaling if the uploader sent all the different quality feeds (and let congestion control ensure they don't oversend).
I really feel that there has been a concentrated marketing effort that has hurt WebRTC on multiple fronts.
* Google oversells what they have done. They bundled up a bunch of existing protocols. That doesn't get as my promotions/awards though. It also helps them control the standard. If they paint 'Google WebRTC === WebRTC' then they make a change and everyone else has to play ball.
* WebRTC competitors want to paint it in a bad light so they can sell $x
I dunno, https://www.w3.org/TR/webrtc/ seems pretty huge to me. And then there are all the RFCs and notes you need to also read and understand, and then you realize that the real spec seems to be Google’s implementation... that at least was how things felt in 2018. They totally might have progressed since then.
Yes, you can pick from the WebRTC buffet and build a good ingest protocol, but you need a lot of knowledge to do that - I would guess a tiny tiny number of people who read your comment know what DTLS or even STUN are.
I also agree that there is a branding problem, but I think part of it is that we don’t have a good name for this WebRTC ingest pattern. It deserves a name and some attempt at standardization so tools and software can start building towards it without supporting everything in WebRTC.
Because yeah, I think its the least-bad thing we have right now.
I think you must be the same Sean-Der who works on the Go WebRTC library? Thanks a ton - that’s serious open source work.
I have been working on https://webrtcforthecurious.com/ to solve that. It approaches WebRTC from the protocol. I also try to give credit/real history at the end!
I played around with https://github.com/Sean-Der/webrtc-uri but I don't have any experience/friends in the IETF/W3C so it is never going to happen. I agree it would be really amazing. It would be great if a company that had influence picked it up!
Yea I work on Pion! Thanks for using it :) I get a real sense of purpose trying to make WebRTC more accessible/easier to use. I love seeing what people build. Also really feel in love with teaching because of it. It sure has taken up a lot of my time, but 100% worth it.
(SRT itself is a defunct file transfer protocol that's had some life brought back into it by Haivision)
Big broadcasters tend to go for "open standards" from groups like SMPTE and DPP. The "DPP001 - Live IP Profile" recommendation has a variety of different profiles depending on the importance of the media and the network that's involved, but it generally revolves around MPEGTS over RTP, with either dual streaming (sending the same packets down two different networks), or RIST (so a NACK based error correction), or perhaps both. Certainly when I broadcasted major international TV events recently, I've gone for both RIST and two networks.
The last thing you want is your signal to drop out for 10 seconds due to a power failure while your network reroutes just as the final touchdown at the superbowl happens - or worse in the the commercials.
I work at Mux, though I don't have any special sauce on this. Instead, as a small broadcaster (Twitch and the occasional event--I ran Demuxed this year from my home office!), SRT is a godsend. I throw video around my studio all the time, because NDI has weird frame timing issues and sucks up a lot of bandwidth, and I don't always need real-time video. I would absolutely love it as a predictable, latency-configurable (within some parameters, of course) ingest method for streaming.
(disclaimer: I'm participating in the libRIST development)
RIST! It's a well thought out protocol without technical debt (which SRT has), supports multicast (if you need it), null packet deletion, encryption via PSK or DTLS and multiplexing (multiplexing support in libRIST is not fully there yet, it's on the roadmap together with DTLS).
Next iteration of the RIST protocol will support "backpressure" on your encoder, so if you have too much packet loss your encoding can scale it's video bitrate down, while together with null packet deletion you'll be able to keep your bitrate perfectly CBR.
This sounds interesting but until it's in ffmpeg and/or obs (being in ffmpeg would put it in obs more or less by default eventually now I think) that'll limit its practical usefulness for a lot of people. Is that planned/being pushed for?
RTMP is very old as a protocol and pretty rubbish.
But not because it's old. IP is old. UDP is old. RTP is old.
Using TCP for live streaming isn't a good thing, same as using TCP for a VPN isn't good.
SRT isn't great, it grew from a file transfer program (UDT). UDP file transfers like UDP and Signiant were popular 15 years ago when TCP window scaling and buffers were more miss than hit. Companies like signiant and filecatalyst were very big on them, and UDT was an attempt to standardise it.
RIST is technically a far nicer protocol than SRT, but suffered massively from the talking-shop development. It came out two years late, and even then didn't have the proper libraries to just drop into your prosumer camera like SRT did.
N.B. It's a shame as their file transfer management systems were good (control, monitoring, security, scheduling etc), but they were going full pelt on the "TCP is slow, UDP is fast" line - to the point that they deliberatly took standard linux distributions like ubuntu and redhat, and changed the default settings to disable tcp window scaling, so show a "side by side" comparison on their sales booths at events.
So, this has been talked about a bunch already, but I'm involved in a small streaming group that occasionally puts on (online, not in person) marathons and we've been looking for something to use that's better than rtmp for a while for ingest from participants and we really haven't found anything good. At least nothing that's viable to set up with open source software of your own.
We just piloted SRT (having about 35 different streamers stream to us with it from OBS into srt-live-server, then into OBS with a media source), which has been mentioned a bunch here, and it's been a mixed bag and I don't know if we'll stick with it.
The big pro is that it doesn't fall behind like the usual RTMP chain (OBS->nginx-rtmp-flv->OBS media source or web source with flvjs) tends to when connection quality degrades.
I don't think this is necessarily an inherent characteristic of rtmp really, but the open software in this chain tends to just grow bigger and bigger buffers when the connection hitches, leading to progressively worse latency you have to 'fix' by resetting the stream. Twitch and other big ingesters seem to avoid this somehow, and I suspect ingesting directly into HLS or similar avoids it, but low latency HLS is still essentially unavailable in open source packaging so is not viable for our use.
On the downside, though, SRT requires somewhat manual tweaking of latency and buffering parameters to work properly and our experience has been that, while the stream never falls behind, hitches outside the configured parameters produce a particularly ghastly kind of artifacting we've called "streakies" where entire segments of the feed will just turn weird bands of colours. It's possible this is actually a problem in the software, as the bits needed to put this together (particular srt-live-server) seem pretty immature yet.
It seems like the future for this kind of use case is likely to wind up being webrtc, but as of now getting webrtc to play nice with OBS is kind of painful, usually just involving using external software that captures a popped out preview display from OBS. It's a little less than ideal. I think it'll be our next thing to try though, and obs.ninja looks kind of promising as a platform to help with doing it.
> Probably the hardest part of running these things is managing outbound bandwidth costs.
This might be a good time to mention that Hetzner (where I host) has a lot of unmetered gigabit options. I'm not affiliated, just a happy (long term) customer.
Sadly Germany has some laws that censor certain types of harmless content, so that's one downside, but that's not Hetzner's fault, just something to consider depending on what content you intend to host.
I can attest to Hetzner. I pay €35 per month for a dedicated server, and one of the things I run is a (non-exit) Tor relay, which consumes 3.5TiB per day on average.
I don’t like RT[M|S]P, but I have not seen an acceptable alternative for realtime streaming.
The main reason I don’t like RTMP, is because it isn’t supported at the OS level in Apple systems (the platform I program for).
HLS doesn’t come close to RTMP. It’s designed for movie streaming, and doesn’t support realtime streaming. I write video surveillance software, and the HLS lag is unacceptable.
Some great points! I built a project similar to this (think online lectures but it never went anywhere) in spring when the pandemic hit. I highly recommend building a platform like this if you have some spare time. It involves lots of interesting engineering/architecture challenges all over the stack. IMHO, from a technical viewpoint, it's mostly integration work, as all the really hard parts were already done by excellent 3rd party tools and libraries. You'll probably have some kind of RTMP bridge (nginx [1]), wrap ffmpeg for transcoding and play it back using something like Shaka player [2] where the segments are being served by some kind of caching reverse proxy. It took me a few weeks to get a prototype running inside a K8S cluster where oAuth-ed users could publish RTMPS streams which could be watched by other authed users via HLS in a somewhat scalable way. It was surprisingly easy to build this using Elixir/Phoenix LiveView.
Some thoughts on your comment:
> - Probably the hardest part of running these things is managing outbound bandwidth costs.
This. As others noted, you may be getting around this by using something like Hetzner's/Scaleway's/OVH's offerings. However, I think they won't be too happy if you really use that much bandwidth all the time. You can probably forget using IaaS of one of the bigger cloud platforms unless you negotiated some special terms.
> - You'll almost certainly need admin and moderation tools soon.
That's one of the main reasons why I did offer a public instance. It's probably best to host a platform like this in a non-public context such as members of a single org. Just look at what happened to Firefox Send...
> - Please don't keep using RTMP [...] and its security is, uh, _weak_.
Yes it is a bit of a pain to work with but AFAIK you can wrap RTMP inside a TLS session which is supported by OBS. I think I just exposed a stunnel instance back then which forwarded it to a nginx instance which handled auth/auth and forwarding to the transcoding backend. This way you won't leak any streaming keys or video data. Please correct me if I'm wrong. If you have any additional pointers regarding RTMP security, I would be highly interested!
Also, as others pointed out, SRT may be just around the corner. I think we're in for some interesting times as LL-DASH/CMAF and similar technologies are more widely supported. Additionally, there are interesting projects like [3] going the WebRTC route for delivering low latency streams.
- Probably the hardest part of running these things is managing outbound bandwidth costs. You'll need to either limit inbound bitrate or transcode video down to a manageable rate, or else you'll quickly spend a lot of money on shipping 4k video for people.
- Right now, your nginx hosts both do ingest and playback, if I understand it right. You might want to separate the two. It makes maintenance easier, and it lets you scale much better - right now, a single stream probably maxes out on viewership based on the CPU capacity of the single nginx host that is ingesting the stream, transcoding it, and delivering it. If you have multiple nginx hosts which could deliver the already-transcoded stream, you could scale much better.
- Please don't keep using RTMP. RTMP is so stateful that it's pretty hard to manage, it doesn't have a spec, it doesn't have implementations on many devices, and its security is, uh, _weak_. Big players are forced to keep using it because telling their broadcasters to change is hard, but you don't have that problem. You might consider accepting an alternative modern protocol.
- You'll almost certainly need admin and moderation tools soon. Expect lots of pirate streams, as well as some horrific content. You can't run a live streaming platform without admin tools.
- Beware DDoS attacks. This setup looks very, very, very easy to take down, as-is...