Hacker News new | past | comments | ask | show | jobs | submit login

Having attempted to WebRTC as a generic video transport, I can say that WebRTC has insurmountable problems. The two biggest issues are:

1) Lack of client-side buffering. This is a benefit in real-time communication, but it limits your maximum bitrate to your maximum download speed. It’s also incredibly insensitive to network blips.

2) Extremely expensive. To keep bitrate down, video codecs only send key frames every so often. When a new client starts consuming a video stream they need to notify the sender that a new key frame is needed. For a video call, this is fine because the sender is already transcoding their stream so inserting a key frame isn’t a big deal. For a static video, needing to transcode the entire thing in real time with dynamic key frames is expensive and unnecessary.




Webrtc protocol doesn’t dictate 1 or 2. Although browsers do implement some of their own assumptions for this. By default the client side buffer can be orders of 100s of milliseconds. this is as you pointed out tuned for real-time or live applications.

If you’re doing something like YouTube/Netflix and want to avoid going to a lower definition of the stream, that too can be tuned, albeit you’d want to use simulcast and implement your own player (to feed the video and audio frames for decoding at the pace you dictate).


None of these problems are specific to WebRTC. You'll run into them in a WebRTC implementation, you'll run into them with QUIC, even with ffmpeg on the CLI you'll need to specify buffer sizes. As you mention these are both problems with livestreaming and the more you buffer, the less "live" your stream becomes. If you're interested in transmitting static videos, then why not go with HLS or even just making the static file available for direct download through HTTP instead of a live technology?


The buffer sizes in ffmpeg are more about ensuring that the calculated bitrate is accurate iirc than ensuring smooth streaming (although you need your bitrate enforced to guarantee smooth streaming).


IIRC (it's been a bit since I've configured this), you can specify both codec buffers and buffers for streaming to smooth out issues reading from the codec output. I could be wrong though.


1.) Why can't you buffer on the client side for WebRTC? That sounds like a client issue (what library were you using?) not the protocol.

2.) I use the same tactic as HLS. Generate your video with a reasonable (~2 seconds) keyframe interval. When a new client connects start sending at the keyframe.


1) The point of WebRTC is that it’s real-time. If you buffer then it’s not real-time.

2) Adding key frames increases the bitrate greatly which exacerbates problem 1.


1) I don't think WebRTC has a specific point. Lots of users came together with their use cases and was designed by consensus. WebRTC can (and does) have toggles around latency/buffering.

2.) I am not aware of a way you can no keyframes, but be decodable at anytime. I just have done it 'HLS Style' or WebRTC 1:1. Curious if anyone else has different solutions.


1) WebRTC and RTP both have RT in their name. RT stands for real-time. If I recall correctly, the only buffer WebRTC has is the jitter buffer, which is used for packet ordering, not for ensuring that enough has buffered to handle bitrate spikes.

2) Yes, you either need a high keyframe interval or some type of out-of-band signaling framework to generate keyframes. WebRTC uses RTCP. A good question is why does WebRTC feel RTCP is necessary at all? Why not generate a keyframe every N seconds like you do with HLS and remove the complexity of RTCP entirely? The answer is that many clients cannot handle the bitrate at real-time speeds.


1) That is a specific implementation, and has nothing to do with the protocol, which certainly doesn't define a "jitter buffer". People routinely use RTMP--which also has RT in the name--to transfer content to streaming services with massive buffers at every step in the pipeline.


Most common browser implementations use an Open GOP. That means an IFrame is implemented when needed. On scene change or when there’s high motion.

Only naive implementations would burst an IFrame on to the network, most pace them. And if needed, you could split your iframe into several frame intervals and decode them without creating a burst by bit rate.

Actually a lot of webrtc implementations use 1s or 2s GOP length. Again depends on how much control you’ve on your pipeline. Browsers implementations do make some assumptions on usecase.


That is not what open GOP means. Open GOP means pictures can reference IDR frames other than the most recent one in decode order, and is a pain in the ass for various reasons, but is technically more efficient. You're referring to a dynamic GOP.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: