Hacker News new | past | comments | ask | show | jobs | submit login
Jitsi Meet – Improving Scale and Media Quality with Cascading SFUs (2018) (webrtchacks.com)
333 points by lioeters on April 2, 2020 | hide | past | favorite | 116 comments



We did our daily standup call with it yesterday (mostly audio). 12 people, on a $3 Scaleway Debian instance (2GB RAM). Worked perfectly.

Obviously as we stopped using Zoom we have to manually upload all our personal data to Facebook now.


We tried to but with most of our team on Firefox it simply doesn't work unfortunately. Zoom has been the only thing reliable for us.

It's a bit ironic that somewhere, someone is switching from Firefox to Chrome because of data concerns with Zoom.


That's odd. My experience with Firefox on Windows is that I can't get audio to work with Zoom. Jitsi, on the other hand, works perfectly though there is a small yellow box that warns me that I should use a "fully supported browser"


We use the Zoom standalone mostly. So Firefox isn't involved in the Zoom side.

Jitsi works fine with small groups. But if we have >10 Firefox users in a room it doesn't for us. Maybe if you have much better internet than we have or otherwise better resources it works for you?

We were using a WebRTC based tool before Corona (DFN), and had deprecated Zoom, unfortunately the people running that didn't manage to scale it up sufficiently in the crisis, hence we had to fallback to Zoom.


in Zooms webclient for audio to work with Firefox you'll need current Nightly, staged to be released as version 76 in May. Edit: another poster linked the bugzilla issue for AudioWorklets


Anyone know what the reason for the yellow box is?


Because Firefox in Jitsi is seriously not working unless you have small groups and resources to spare. But they are working on it:

https://github.com/jitsi/jitsi-meet/issues/4758

https://github.com/jitsi/jitsi-meet/issues/5439


Does it need more bandwidth or more server resources (or both?)


I am on ubuntu 18.04 lts using firefox and jitsi hard locked my machine, it was still sending packets, but all UI was frozen, I had to power cycle the machine.

On osx using chrome, it was flawless.


What does prevent your team from using "ungoogled chromium"?


Have you tried BigBlueButton?


> Obviously as we stopped using Zoom we have to manually upload all our personal data to Facebook now.

I feel like there is a joke that's going over my head or I simply don't understand. Why you have to manually signup/login/auth/whatever with Facebook because you're using Jitsi now, especially on your own host?


The joke is that zoom was found to be sending user info to Facebook apis.


I missed that story so it went completely over my head. Now I could read about it, thanks!


... because Zoom isn't doing it automatically anymore?


That is the joke. :)


Did you use cascading SFUs? What did you deploy? I'd like to set up my own turn server and use jitsi.


It's a fantastic open source solution. Can be self hosted.

It is to note that traffic does go through the server, hence need decent bandwidth. the server afaik does have the keys to decrypt traffic


The server (jitsi videobridge) DOES have the keys to decrypt the traffic - i.e. its not e2e encrypted. This is par for the course with webrtc SFUs (there are some "workarounds" to support e2e encryption over webrtc I mentioned in another thread).


The Jistsi team says the server needs roughly 5.5Mbps per Chrome user. Firefox uses a lot more bandwidth, system resources, and degrades the room capacity.

Just something to keep in mind, and after some testing I saw the same results.

My AWS bill was projected to be over $1k/month, so I put it on Linode where it'll cost between $100-200/month. Just about any decent VPS provider would be good options compared to AWS due to bandwidth.


> The Jistsi team says the server needs roughly 5.5Mbps per Chrome user.

That sounds like a lot! I wonder why an almost still image can use that much bandwidth, I guess it has to do with the low-latency requirement but I would love more details on that.


We use a technique called simulcast. It consists on making every participant "work a bit harder" for the good of the bunch.

That is, every participant sends 3 separate video resolutions to the server: 720p, 480p and 180p (this may change due to bandwidth constraints). Then the server will only forward the approopriate layer to each other participant. So, if you are only seeing me in a thumbnail it will only forward the 180p layer. If I become the active speaker (or you choose to pin me to the large view) the server will immediately switch to forwarding the 720p layer.

In addtion, we use SVC with temporal layers, so thumbnails may be given at just 15fps or even less.

We do have a trick up our sleeve (but IIRC it haad to be disabled for the time being) which involves disabling the ssimulcast layers that nobody is requesting. That is, if nobody iss seeing me in the large view, why send the 720p layer at all?

Hope that helps!


SVC spatial and quality layers [1] sound like a really good solution to the bandwidth issue. From my (extremely limited) understanding, basically if you skip certain packets you get a lower resolution/quality stream. A client sends a single stream at the best quality it can tolerate, and then the SFU can forward whichever layers to each client depending on what resolution that client wants.

What's the state of this in jitsi? I can only find limited info about SVC [2] and that is only on temporal layer. How much bandwidth does it even save in practice - maybe it's not worth the complexity trade-off?

[1] http://webrtchacks.staging.wpengine.com/chrome-vp9-svc/

[2] https://github.com/jitsi/jitsi-videobridge/blob/master/doc/s...


Thanks for your interesting answer. It doesn't really address my question though, so I will rephrase it:

If I take a 720p video of a webcam and encode it to be delivered as progressive live streaming, the resulting stream is going to be less than one Mbps : because the image doesn't move much, I don't need many key-frame, one every 4 seconds is more than enough, and I've seen streams with no more than one every 30s (live streaming of harbours' CCTV cameras, don't ask me why). But of course it won't be realtime either, and you're gonna have a few seconds of latency. While this is OK for live streaming, it is certainly not for a video chat room.

What I'd like to know is why the latency requirement reduces that much the encoding efficiency. Do you have an idea ?


Naive (and perhaps stupid) question here, why don't you request just the 720p feed from each participant and then rescale the streams as clients need them?


The server would have to downscale the received stream. That's at least one round of downscaling for each participant, possibly more if you want to send different quality versions of the same stream down to different participants.


And while your device is quite capable of rescaling your stream -- even a lowest-tier smartphone will do it with no strain on its GPU -- its much more of a strain on the server to rescale 100's/1000's of stream simultaneously.


Additionally: AFAIK you can't easily rescale an encoded stream: you need to decode it, rescale it and then re-encode it. For every single stream! That would be horribly computationally expensive.


Ram bandwidth is easier to manage and provision than physical bandwidth.


Almost still image is a weird way of saying “numerous still images regardless of amount of motion, that have to be stitched together in precise time sorted batches to provide a smooth experience for the viewer.”

Thinking about the use case it’s obvious this isn’t a simple calendar app amount of data.

I bet if you create some images in various resolutions these services support and what looks good to your eye, then fill folders with sequences of them, you’ll see why it’s a lot of bandwidth.

Better to over communicate and let the client deal with the organization as it’s designed to and not let some network admin nickle and dime over bits.

Financial economy doesn’t really say much about the literal economy of building all these toxic gadgets.

Given the big picture, not sure why such a trivial concern as bandwidths ephemeral money cost would foster such strong curiosity.

Even then, code is in a repo. Go learn.


Is there cheaper bandwidth available than Hetzner's 1.19euro +vat per TB? (Linode seems to be at 10$ / TB).


Or Hetzners dedicated hosts that have zero cost for bandwidth and unlimited traffic as well, haven't been able to find anything cheaper than that.


Leaseweb can be o lot cheaper, it depends.

I'm starting an ISP again, we will charge around $0,25 per TB


OVH doesn't limit it afaik.


I setup my own server yesterday and shutdown the server in the middle of a conversation. The conversation kept going.

Traffic only goes through the server for users behind NAT, triggering the TURN path.


That is only for 2 participants. For 3 or more (video participants), it uses the videobridge SFU.

Source: https://github.com/jitsi/jitsi-meet/blob/master/doc/manual-i...


I had 4 people, so either the docs are wrong or they have forgone the video bridge on my server and are using their own.


If you had 4 people the video was going trhough the JVB for sure. Since a Jitsi Meet installation uses multiple components, maybe you didn't shut them all?

Video shenanigans can be a dark art, but not that magical ;-)


All installed on one server, physically shut the entire server down.


Very possible the docs are wrong - im not a jitsi dev, so I wouldnt be able to confirm or deny without poking around. BUt they do specifically state that (as linked).

That said, it is worth pointing out that this thread is specifically about videobridge (i.e. scaling beyond a full mesh).


I've been playing with a custom install of Jitsi Meet for the last few days. It was very easy to setup.

I'm kind of fearing trying to deploy it in my company. This has nothing to do with Jitsi Meet and everything to do with the shitshow that is WebRTC in browsers.

I can't find any desktop browsers that use the available GPU for hardware accelerated video encoding. Chromium (Google and Edge) says it's "only available on Chrome OS and Android". Firefox has a flag with no scare-provisos, but I've been unable to tell if it's had any effect.

By default, every browser on my system was setup to prefer the Intel UHD GPU. I've got an NVidia GeForce RTX 2080 in this laptop. Why not use that?

One of the reasons is that all the browsers have a software-rendering blacklist for certain combos of OS/hardware/drivers. There are rare cases where errant WebGL code can cause a full system crash on (checks notes) Android. So if you're one of the unlucky many who have these combos, but also an operating system smart enough to put graphics drivers into userland, you're taking the fast-train to turning your laptop into a blow-dryer. So they tend to take the Intel GPU over the NVidia GPU because apparently Intel's hardware+drivers isn't as buggy.

You can override it in the hidden settings, which means nobody overrides it.

All of this is to say, you could have a very powerful computer and still have very poor WebRTC performance.


I also had very bad experience with WebRTC in browsers. Just several days ago i tried videoconference with Jitsi Meet. Although it was receive only (no camera or mic), so no encoding, and received video was pretty low-res, it generated enough load on all my cores that after a while it triggered thermal throttling.

Considering that i can play Full-HD H.264 videos with minimal load then this seems ridiculous.


Does this explain why I get poor Jitsi experience on my laptop and better using the Android app on my phone??

I suspect you are hitting on the exact issue I'm having. I have a Thinkpad with just the Intel 5500 GPU, nothing superb. So, how do I figure out if at least that is being used or not?


Windows Task Manager will show you GPU utilization on the Performance tab. Start Task Manager, then start your browser and see if there is an uptick. Then start a meeting to see if there is another one. There probably won't be. Then, disable the software rendering override in your browser: (Firefox: https://wiki.mozilla.org/Blocklisting/Blocked_Graphics_Drive...) (Chrome: https://superuser.com/questions/836832/how-can-i-enable-webg...) and test again. Probably need to restart the browser after the change in settings.


Why would you assume I'm on Windows?

I'm on GNU/Linux here.

But anyway, yes, I see specifically an uptick in the CPU dramatically with meetings, distinct from other browsing.


What's the setting for hardware video encoding in Firefox? media.webrtc.hw.h264.enabled or media.webrtc.platformencoder?


on closer inspection it looks like media.webrtc.hw.h264.enabled is for decoding and media.webrtc.platformencoder is for encoding on Android :(


learn to buy hardware with decent drivers.

hint: it's not nvidia.


Something that isn't mentioned in the docs is room (and server) capacity, or at least rough estimates.

The Jitsi team gave some specific numbers on room capacity in the forums. Each room should reliably handle 20-35 Chrome users (Firefox uses roughly double the resources), and has a cap at 75. Apparently if you're only using audio, then rooms of 70+ people will work fine.

They're working on upping this number to 500 for configurations that have multiple bridges, as well as improving the UI to account for larger meetings.


Where does the difference between Chrome and Firefox come from? Is Firefox using a different video format?




Unfortunately from my experience, Jitsi does not reliably handle 20-35 users


Any Firefox users among those?


Jitsi-meet is very easy to install and to setup. "apt install jitsi-meet" and you have almost all done. The documentation is very good. It works great and it is super userfriendly. You have no excuses.


FYI -- I tried this in a fresh Ubuntu 18.04 LXC container and it failed spectacularly.

There's a weird interaction between the NOFILE security limit (`ulimit -n`), Java 8, and Linux-based containers (noted on both LXC and Docker). If NOFILE is too high, Java 8 will paradoxically run out of memory as it tries to allocate some huge number of file descriptors. That took a while to figure out, since generally when you get an error about exhaustion of file descriptors, you expect the opposite!

So just set a lower security limit, right? It doesn't seem to work, I'm assuming because something in the Jitsi install process decides it needs to set them really high for the install session. I haven't debugged what component specifically is doing this yet. This results in the install scripts crashing when the installer tries to initialize the Java CA store.

In theory one could wait for that crash, reset the file limit, and then resume the install process. This works to a certain point, but every time I tried it, the crash in the CA store initialization would leave the dpkg database in a badly broken state that `apt-get --fix-broken install`, `aptitude`, and some manual dpkg futzing wasn't able to fix in the course of the 3-ish hours I spent working on it (complains on libc6-linux-dev package install that /usr/include/linux/something-something.dpkg-new doesn't exist; that folder structure isn't present on disk).

Technically Jitsi is set up by this point because the crash occurs as part of the certbot bootstrap, but I was working on getting Janus to work anyway and just tried out jitsi-meet because they had an autoinstaller that would set up Prosody, JVB, and all that good stuff anyway, so it wasn't really worth plugging through the rest of the way or dealing with the broken half-installed Jitsi, especially since there are many warnings about how you must configure the SSL bits properly in the install documentation.

But it'd be great if someone wanted to fix all that and make my life easier. :)


Once I heard about it I gave it a go, just to look at how easy it is to setup.

Super easy, + SSL!


> You have no excuses.

Billions of users who aren't using linux distros with aptitude have an excuse. This isn't remotely close to the "just works" that you need to get wide adoption.


Yet it is probably the easiest and most "just works" solution to self host conferencing.

Of course if you don't selfhost anything it's not option for you.


Alternatives to Jitsi are Janus (mentioned in this thread) but also OpenVidu (formerly Kurento) and Mediasoup which provide building blocks to roll your own. All of these also have demos that you can use out of the box for our own conferences.


Just a nitpick: Kurento still exists on its own, and provides a generalist framework of components to build a service that can handle media.

OpenVidu is one such service, it builds _upon_ Kurento, and itself can be used to make videoconference rooms very easily. Otherwise, Kurento is just a media server that handles media but doesn't have the concept of "rooms", "publishers", "consumers", escalability, reliability, etc. so you would have to develop all those things on your own.


Are there any good tutorials for how to setup Jitsi to work for 30-50 people? The discussion forums are hard to navigate.


consider using mumble


mumble / murmur is great, I host it for family and friends to use. That said, it's only voice and text. No video or screen sharing yet.


I was told that Matrix/Riot is a good alternative to Jitsi (can share a meeting URL and offers E2EE). Does anyone have a data point here?


Riot isn't an alternative to Jitsi so much as a complement to it. Matrix is for text chat, and Riot uses Jitsi for calls; while you can embed a Jitsi client into Riot rooms, and it auto-shares the URL, AFAIK group calls are not E2EE even if the room is.

One-on-one calls through Riot _are_ encrypted, but only if made in a one-on-one direct chat.


The riot features page makes it seem very "built-in" (https://about.riot.im/features#voice-video).


riot.im hosts its own Jitsi instance which is built into the UI, so hitting the call button in a room will spawn an embedded Jitsi call in the room.

Recently we added the option for people to use their own self hosted Jitsi instances, if they prefer.

(1:1 calls at the moment are normal WebRTC, directly over Matrix)


I might be breaking the rules, but can you point me towards any information about how to integrate my existing self-hosted jitsi server into my self-hosted matrix install? Finding this seems elusive.


I tried and i don't think its easily possible.


Thanks for the clarification!


I would love to use this instead of zoom. Any news about firefox support?


Patches are being reviewed: https://github.com/jitsi/lib-jitsi-meet/pull/1062

Included: "Fix the order of the simulcast streams for Firefox."

Subscribe to this issue to keep track of things: https://github.com/jitsi/jitsi-meet/issues/4758

From the latest comments you can see that Firefox upstream is also improving things.


They're actively working on it - in the GitHub issue below, Jitsi devs have been engaged with the community, as recently as yesterday.

100% support for Firefox (and other non-Chrome browsers)

https://github.com/jitsi/jitsi-meet/issues/4758


You might want to try Janus Gateway instead. It works fine on firefox, and is stable and reliable.

https://github.com/meetecho/janus-gateway


I use it for quick meetings on Firefox and so far it works with the limited set of features I need, despite the warning.


Last week I could not manage to use it for a three-way conversation (after 1h of trying everything, and re-initializing the stuff several times). Sometimes the sound of one participant was missing, sometimes the video of one. When we got it to work for a subset of two people, the CPU of everybody was at 100%.


How come it works for me on Firefox? v70.0.1

Edit: got it, it's that particular feature still missing.


Did a 6 person meet last night, all with video, and I was on firefox, worked fine.


I have been considering developing a simple video conferencing solution using this approach with WebRTC. Basically, I was planning on doing something like this example [0] of server-side peers or just a set of forced TURN servers, whichever approach being deployed on servers at the edge.

I figured using anycast IPs, and having server to server communication across regions before then sending back down to the client would be ideal. Each offer and answer signaling would be done via distributed pubsub (e.g. embedded nats) and general persistence also distributed (e.g. something simple like dqlite). Has anyone had success with distributing WebRTC channels like this? Are there any concerns with redistributing, say, raw h.264 and opus as is without special concern for buffering or transcoding? How do slower consumers handle a fast set of UDP packets?

Also, I doubt there's a way, but anyone familiar with an end to end encrypted approach with WebRTC and DTLS when a server is in the middle? I figure not since offer/answer is for a single peer instead of broadcast and I see no approach in the browser to doubly encrypt media streams.

0 - https://github.com/pion/webrtc/tree/master/examples/broadcas...


Using your first solution (a custom pion SFU relay), and selectively routing the streams between "edge" servers would be similar to Octo, as shown here.

Using TURN servers instead would not have the benefit of SFU (i.e. clients will have to upload to each peer) - i.e. its still a full mesh network.

Using TURN servers with SFU would work similar to your pion solution, however it would also use more bandwidth, as it would be forwarding the same streams multiple times for each peer routed through that TURN server (instead of once per stream with SFU)

As for e2e encryption over webrtc via an SFU - yes, this is possible, but its currently very messy (wasm video encoding and encryption streamed over an SFU-bound datachannel with full mesh distribution of the encryption key). There are plans to implement "Insertable Streams" which you will be able to transform (e.g. encrypt) which will allow this to work without the hacks.


>As for e2e encryption over webrtc via an SFU - yes, this is possible, but its currently very messy (wasm video encoding and encryption streamed over an SFU-bound datachannel with full mesh distribution of the encryption key). There are plans to implement "Insertable Streams" which you will be able to transform (e.g. encrypt) which will allow this to work without the hacks.

So currently Jitsi meet the one on the web site is NOT e2e encrypted?


Yeah, not e2e encrypted when it goes through videobridge (as per this article).

I haven't checked, but its possible for 1-to-1 or small meetings they may go full mesh, which would be e2e encrypted - a few platforms do this.

edit: just checked and jitsi is "full mesh" for 2 participants - if you have 3 or more (video) participants, it switches over to SFU.


I've been excited to try it with our company for a few months, but is it as "user friendly" as Zoom for non-technical users?


Yes. Installed it yesterday, ran a 12 person meeting with no instructions to the users other than "go to this URL". No issues.


I would even say it's more friendly than Zoom. I only tried the hosted version for both, but the experience with Jitsi was orders of magnitude simpler and faster than Zoom.

For creating a room, you need an account on Zoom and it's generally complicated to get to from the homepage. With Jitsi, right on meet.jit.si, you simply pick your url name and you're in.

For joining, Zoom kept trying to push me to download the app, and had to press two tiny buttons to get it to open in the browser. But by default you have no audio or video, so I had to spend 10m teaching my parents how to enable theirs. With Jitsi, it opens right in the browser, and video/audio work with no issue once they accept browser permissions.

At every step, zoom tried getting me to sign up or download their app, whereas Jitsi works fully in the browser with no account. Creating the room is a single click and I get to pick the url too.


Load a URL on your computer. No accounts, no software. IMO, yes.


People really undervalue the awesomeness of not having to set up an account to use an online service.


No account aside (which both have), Zoom really tries to make you download the app and you have to click a bunch of tiny buttons to get it to open in the browser. Jitsi doesn't do that, it all works right in the browser, including screen sharing and all other features.


Seems more user friendly to me, people just have to visit a url, no need for an installation.


I like Jitsi because it doesn't require any signups and it's simple to create a channel/room. But the quality of video and audio seems a bit lacking. Does anyone have suggestions on how to improve Jitsi Meet's video and audio quality when working with the official smartphone apps and platform (not self-hosted)? Sometimes (on a broadband connection), for some participants, the video from a broadcaster freezes once in a while, and worse, the audio is many a times not clear even when the video seems fine. This is with just three or four people in the call with one person turning on the video and audio (like a presenter), and the rest of them with no video and with their mics muted.


Are any of the participants using Firefox by chance? There are known issues being worked out ¹, some in collaboration with Firefox itself ², to improve performance and compatibility.

For now, it seems the best audio/video quality can be achieved by all participants using a Chromium-based browser such as Chrome and Edge.

---

¹ 100% support for Firefox (and other non-Chrome browsers) - https://github.com/jitsi/jitsi-meet/issues/4758

² Bugzilla keyword jitsi-meet - https://bugzilla.mozilla.org/buglist.cgi?status_whiteboard_t...


Thanks. I know about the Firefox issue, but all participants are on mobile — mostly Android and some iOS — using the official Jitsi Meet app.


You can override configuration options by appending `#config.optionname=value` to the meeting url in a fresh new browser tab


Which configurations can ensure video and audio at a good enough quality with audio being clear and video not dropping frames (even if the video is a bit blurry, that’s fine)?


You can lower the video quality in the bottom right menu. You can even limit your own camera's capture resolution, for example #config.constraints.video.height.max=360 . See https://github.com/jitsi/jitsi-meet/blob/master/config.js for all the options.


What about the cost in terms of per person per hour?


Depends on the bandwidth charges of the SFU host, mostly that means computing the volume (5.5Mb/s * participants * duration) though some hosts aren't volume billed in which case it might then be a matter of whether sufficient data rate available (drop the duration) or paying for the peak consumed. That's just the A/V conferencing not file sharing or other incidentals one often finds useful.


It’s free and opensource.


Server time still costs money, I'd be curious to see how much at scale.


If you have commodore 64 you are OK.


I wanted to evaluate it but was very put off when trying to connect to my google calendar they asked for full access to my Youtube profile. I get there is some youtube integration they have but binding those things together without any mention of why was enough for me to click cancel. Sucks because I really want to ditch Zoom.


Jitsi has really been kicking it at my school lately; I wrote a chat bot for out Nextcloud that creates meetings and now up to 150 people can easily join into one conversation!


Can someone expand the SFU acronym? I chased the link to sfu-tagged blogs at TFA and an expansion is nowhere to be found.

EDIT: Thanks!


Looks like it's a WebRTC acronym: https://webrtcglossary.com/sfu/

SFU stands for Selective Forwarding Unit.

At times, the term is used to describe a type of video routing device, while at other times it will be used to indicate the support of routing technology and not a specific device.

An SFU is capable of receiving multiple media streams and then decide which of these media streams should be sent to which participants.


non-intuitively at the end of first section:

> Selective Forwarding Units (SFU)


We started to use it at our startup. If anyone needs to set it up, we can do so in the cloud for them.


I tried installing this on two different Ubuntu machines using their Ubuntu repo and it did not work. Worked fine on both Android and Ios. This is about what I expect from a java application - the java ecosystem is co convoluted and fragile that the only real options are Android, Ios, and Docker. I wish someone would write something like this in Go or Rust.


Docker4thewin :)


Does anyone know how the audio quality and latency of Jitsi Meet compare with Zoom?


I've used it several times to chat with family members all over the globe, and I use Zoom at work. Jitsi's great - I don't perceive a difference.


Too bad, like most other solutions, Jitsi, by default, shares screen at 5 FPS.


Is that possible to change, perhaps if self-hosting? Video quality of screen sharing and frame rate are absolutely atrocious, to the point that you have to wait ten seconds until text becomes readable.



That's not been my experience. By default, it seems to adapt frame rate based on bandwidth. Some are 15 fps, some more, some less.


I was talking about screen sharing FPS. Not, the normal video FPS. I have plenty of bandwidth on my optic fiber connection. You can take a look at the situation @: https://community.jitsi.org/t/desktop-sharing-frame-rate-con...

and

https://community.jitsi.org/t/screen-sharing-fps/25132




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: