Whenever these threads pop up I always ask: is there an erlang/elixir implementation anyone knows about? I check every couple of months and come up empty (there is one closed source one.) Being able to speak WebRTC natively to a Elixir/Phoenix cluster would be a pretty killer app, but the hurdle to get there is quite high.
In terms of communications between client and server, I'm not sure what you get with WebRTC that you don't get with websockets? What's your intended application?
For game-related use cases, we need to be able to specify which data is reliable and which isn't. Websockets use TCP which doesn't give you that control.
I also would like to have a WebRTC lib usable from Rust, but it doesn't necessarily need to be written in Rust itself. This one being in Go means it's probably hard/inefficient to write binding for it, but maybe this one (in C) could do the job : https://github.com/rawrtc/rawrtc
No, signaling is different from STUN. Signaling is basically pairing together the people who want to communicate so that they can get the info required to connect to each other, STUN is how they find out their public IP:port pairs and TURN is how they can talk over a proxy if direct communication fails.
So you always need some form of signaling but that can be over email or even a handwritten note if you prefer, although it is usually done over HTTP/websockets.
STUN is required if you are behind some sort of NAT.
TURN is required if your NAT does not play well with hole-punching.
As you see if you run that it is a request-response like flow. So sure you can send the initial offer in the URL, but then you somehow need to get the answer back to the initiator.
So while you can have a initatior URL sent to the responder and then the responder send back a URL to the initiator that is still not "click link and you are connected".
Handling signaling is pretty much the easiest bit of webrtc as it is basically just a HTTP/websocket echo server with some ID or similar for the meeting.
If you have a websocket server (or REST & SSE as I i usually do) you can just have meet.example/{meetingId} and echo everything on meetingId to all others on the same meetingId. That is as simple as the web chat examples that thousands of beginner programmers create their first year.
You should also consider that the signaling info (called the SDP) does not have a set lifetime and can in some cases be valid indefinitely and in some cases just valid for less that a minute, so if you encode the SDP in the URL you:
1. Can't setup a meeting URL beforehand which is how most people want them to work.
2. Need a back-and-forth over some other medium like email/chat/pigeon.
So it's just adding an extra step where the owner needs to also click on a new URL. But I agree since signalling server is rather dumb this can probably also be outsourced by a "public signalling server"?
Yeah, that is possible but my point is that it sorta breaks how people are used to join these kinds of meetings. The back and forth required is not the expectation most people have.
If you have more people then you'd need to do this once per person, (after that they can gossip the SDP over data channels to find the other participants).
Usual flow:
1. I and other people go to meet.example/DiscussImportantStuff
Your proposed flow:
1. I go to meet.example/DiscussImportantStuff (and it generates my sdp in the background and appends it to URL)
2. I send meet.example/DiscussImportantStuff#MySDPHere to my friend
3. He goes to meet.example/DiscussImportantStuff#MySDPHere (and it generates an answer SDP and replaces it to URL)
4. He sends me back meet.example/DiscussImportantStuff#FriendsSDPHere
5. I go to that link and we are connected.
Repeat steps 2-5 for each participant.
Considering how little technical complexity is saved and that you still need to have some sort of communication channel set up I don't think the proposed flow is worth it.
Curious though if the "signalling server" can be abstracted away the same way STUN servers are: put a few URLs of signalling servers in the client app and it would choose whichever. They would all need the same echo'ing capabilities.
Point is to not maintain or have to spin up any servers for developing WebRTC apps, but making them fully autonomous.
Something like this could also be pushed forward to develop some sort of DHT around WebRTC so that this process of finding "signalling servers" can be made even more self-sufficient if the hardcoded urls in the client code are all offline.
EDIT: tried to be more clear by condensing the comment into two questions:
1. How can you trust trust if it is established over an untrusted channel and you have no previous store of trust?
2. How can you verify identity when you have no trust and it is communicated over an untrusted proxy (the signaling server)?
STUN/TURN plays no part in establishing the trust between the parties, they just facilitate it by acting as an lookup service or a forwarding service. The signaling has to be trusted for the communication to be trusted.
---
Original comment:
---
One problem is that signaling is pretty specific to the app using it, for example how a meeting is determined and how it is used is very different between zoom, google meet, slack and so on. There is also a question of trust, in a P2P webrtc flow you can have end-to-end encryption, but it still requires you to trust the signaling (since you have no way to communicate trust before signaling).
For purposes where you have a previous channel to communicate trust you probably don't need signaling via a third party and for purposes where you don't have a channel you probably couldn't trust the signaling party if it was just an open relay on the net.
For the "free signaling server" to be a good solution it would first have to handle the problem of proven identity, which is something that even facebook with billions of users have a problem with.
DHT solves a very different problem than identity, the problem is not being able to speak/address to a user, the problem is being able to speak to the right user. WebRTC provides a channel to do that if you point it to the right user. Our problem is finding that right user in a secure, smooth way.
You can signal in more ways than this: email, QR code or any method that can transfer a small amount of data [1]. You do need to think of the security of this channel though as it forms the basis of authenticating the other party through the fingerprint attribute in the signaled SDP payload [2].
For my project that has a server side peer connection, I use a simple HTTP POST from the browser via fetch with the offer that responds with the answer. Then as I need renegotiation, I send the offer from the server side over the data channel and wait for the client side answer back over the data channel. Works fine.
In a 1 to 1 setting, it obviously doesn't work. But in a mesh setting with n connected peers, when you want to add a new peer, you can perform the first signaling step via a server to connect with one member of the mesh, and then do all the subsequent signaling through this peer. This way, to set-up a mesh with n users you only need n-1 signaling messages processes by your server (down from n(n-1)/2 if you perform everything through a centralized signaling server).
In practice, I know nobody who does that though. I don't think the extra complexity is worth it if your mesh are never really big, and with WebRTC, you rarely encounter situations where you have more than a few peers in the same mesh (Google Chrome even used to struggle a lot if you had more than a few dataChannel opened on the same page, while Firefox handled hundreds of connexions without issue).
Ah, yeah, I was thinking of the initial connection.
I guess the mesh needs to be pretty small so that you can have a fully connected mesh or you'd need some way to deal with netsplits and a gossip-style protocol for discovery of new nodes, right?
I have the same question. If it doesn't work on iOS it's not a cross platform stack. iOS is the third most popular operating system in the world after Android and Windows.
I'd like to stream high-quality, low-latency audio from a C++ app alongside webcam/mic from chrome/firefox. The receiver should get video and a mixed blended audio. Maybe even screenshare and mouse/keyboard control. Is this possible?
Lots of products, including ours, running on Go. It's not a fad, it's a useful tool. Not the best programming language in the world, but still allows you to produce great results in a short time. Kind of like Delphi what used to be for the Windows desktop.
Is Java a fad? Its early years were much more hype driven than Go has been at any time during the 10 years Go has been around.
Python? JavaScript? Ruby? Rust?
What I like about the Go community is that it appears to be very measured. Unlike many other communities where the way a language is used becomes bigger than the language. Like Spring and Java. Rails and Ruby.
Or Rust and “look I reimplement X despite nobody giving two shits”. :-)
(My big disappointment with Rust is that it hasn’t found its niche. I’d like that niche to be embedded operating systems. We could certainly use Rust there, because C, C++ and the utterly shit Python-junk used to stich things together is just painful . Competing for attention in server development or any other area that has very strong contenders is clearly an uphill battle)
Go is over a decade old, not counting a couple of years of development in private. I'm not the biggest fan of the language but it's hard to think of it as a fad.
Fad or not, in terms of programming languages, ten years isn't such a long time. It typically takes a few years just to reach the eco system maturity and community size needed to become a popular language. Rust is also ca. ten years old and I'd argue that it still haven't reached its full potential.
The language absolutely was a fad for a while. The shiny new thing everyone was moving to. That is how languages get popular. But languages unlike terminology don't just disappear.
> Ship to Mobile, Desktop, Servers and WASM all with one code base.
Dumb question: what would a WASM implementation be useful for? Browsers don't expose "raw" TCP or UDP connections, so even if you have a fully working WebRTC implementation in WASM, there would be no way to connect to anything when ran inside a browser.
People are building server-side WASM runtimes, which I suspect is your answer. To me the other answer is they could provide the same API in the browser through a shim.