Hacker News new | past | comments | ask | show | jobs | submit login
Modern and Cross Platform Stack for WebRTC (github.com/pion)
143 points by todsac on May 1, 2020 | hide | past | favorite | 48 comments



Whenever these threads pop up I always ask: is there an erlang/elixir implementation anyone knows about? I check every couple of months and come up empty (there is one closed source one.) Being able to speak WebRTC natively to a Elixir/Phoenix cluster would be a pretty killer app, but the hurdle to get there is quite high.


In terms of communications between client and server, I'm not sure what you get with WebRTC that you don't get with websockets? What's your intended application?


For game-related use cases, we need to be able to specify which data is reliable and which isn't. Websockets use TCP which doesn't give you that control.


Yep, also I’d like to be able to transfer voice/video through the erlang cluster


Interesting! Personally I am looking forward to a Rust implementation.


I also would like to have a WebRTC lib usable from Rust, but it doesn't necessarily need to be written in Rust itself. This one being in Go means it's probably hard/inefficient to write binding for it, but maybe this one (in C) could do the job : https://github.com/rawrtc/rawrtc


Yeah me too, what's the deal in Cargo with developers sitting on package names with empty code repos.

This guy has a bunch of webRTC related ones: https://crates.io/crates/webrtc


Some signaling strategies for WebRTC:

- WebSockets

- XHR and other Comet options

- SIP over WebSockets

- XMPP/Jingle

- WebRTC's Data Channel

Obviously, one would want the data channel (last option). There's some work done in this area. However, it's still not full featured.


For anyone curious, there is a great article [1].

WebRTC enables peer to peer communication.

BUT...

WebRTC still needs servers:

For clients to exchange metadata to coordinate communication: this is called signaling. To cope with network address translators (NATs) and firewalls.

-------------

[1] https://www.html5rocks.com/en/tutorials/webrtc/infrastructur...


You need a signaling channel to your peer. This doesn't have to be a server, but it makes things simpler.


My preference is IPoAC.


I think that adds up some latency


But you can use publicly available servers to do that no? STUN servers and such. So you don't need to roll our your own.


No, signaling is different from STUN. Signaling is basically pairing together the people who want to communicate so that they can get the info required to connect to each other, STUN is how they find out their public IP:port pairs and TURN is how they can talk over a proxy if direct communication fails.

So you always need some form of signaling but that can be over email or even a handwritten note if you prefer, although it is usually done over HTTP/websockets.

STUN is required if you are behind some sort of NAT.

TURN is required if your NAT does not play well with hole-punching.


Ah indeed. But can't the handshaking be done without rolling our your own signaling server?

Here's a great example: https://jameshfisher.github.io/serverless-webrtc/index.html

Checkout the process in the console! For instance you can do the handshake the same way you'd send someone the URL of the actual thing.


As you see if you run that it is a request-response like flow. So sure you can send the initial offer in the URL, but then you somehow need to get the answer back to the initiator.

So while you can have a initatior URL sent to the responder and then the responder send back a URL to the initiator that is still not "click link and you are connected".

Handling signaling is pretty much the easiest bit of webrtc as it is basically just a HTTP/websocket echo server with some ID or similar for the meeting.

If you have a websocket server (or REST & SSE as I i usually do) you can just have meet.example/{meetingId} and echo everything on meetingId to all others on the same meetingId. That is as simple as the web chat examples that thousands of beginner programmers create their first year.

You should also consider that the signaling info (called the SDP) does not have a set lifetime and can in some cases be valid indefinitely and in some cases just valid for less that a minute, so if you encode the SDP in the URL you:

1. Can't setup a meeting URL beforehand which is how most people want them to work.

2. Need a back-and-forth over some other medium like email/chat/pigeon.


But it's a very similar flow as you'd normally do with a URL no?

- You'd send someone a link such as http://chat.com/#id - Other person opens http://chat.com/#id redirects to another url, and this other url (http://chat.com/#someOtherId) is sent back to creator - Creator clicks on link again

So it's just adding an extra step where the owner needs to also click on a new URL. But I agree since signalling server is rather dumb this can probably also be outsourced by a "public signalling server"?


Yeah, that is possible but my point is that it sorta breaks how people are used to join these kinds of meetings. The back and forth required is not the expectation most people have.

If you have more people then you'd need to do this once per person, (after that they can gossip the SDP over data channels to find the other participants).

Usual flow:

1. I and other people go to meet.example/DiscussImportantStuff

Your proposed flow:

1. I go to meet.example/DiscussImportantStuff (and it generates my sdp in the background and appends it to URL)

2. I send meet.example/DiscussImportantStuff#MySDPHere to my friend

3. He goes to meet.example/DiscussImportantStuff#MySDPHere (and it generates an answer SDP and replaces it to URL)

4. He sends me back meet.example/DiscussImportantStuff#FriendsSDPHere

5. I go to that link and we are connected.

Repeat steps 2-5 for each participant.

Considering how little technical complexity is saved and that you still need to have some sort of communication channel set up I don't think the proposed flow is worth it.


Indeed that makes sense!

Curious though if the "signalling server" can be abstracted away the same way STUN servers are: put a few URLs of signalling servers in the client app and it would choose whichever. They would all need the same echo'ing capabilities.

Point is to not maintain or have to spin up any servers for developing WebRTC apps, but making them fully autonomous.

Something like this could also be pushed forward to develop some sort of DHT around WebRTC so that this process of finding "signalling servers" can be made even more self-sufficient if the hardcoded urls in the client code are all offline.

Just a thought ️


EDIT: tried to be more clear by condensing the comment into two questions:

1. How can you trust trust if it is established over an untrusted channel and you have no previous store of trust?

2. How can you verify identity when you have no trust and it is communicated over an untrusted proxy (the signaling server)?

STUN/TURN plays no part in establishing the trust between the parties, they just facilitate it by acting as an lookup service or a forwarding service. The signaling has to be trusted for the communication to be trusted.

--- Original comment: ---

One problem is that signaling is pretty specific to the app using it, for example how a meeting is determined and how it is used is very different between zoom, google meet, slack and so on. There is also a question of trust, in a P2P webrtc flow you can have end-to-end encryption, but it still requires you to trust the signaling (since you have no way to communicate trust before signaling).

For purposes where you have a previous channel to communicate trust you probably don't need signaling via a third party and for purposes where you don't have a channel you probably couldn't trust the signaling party if it was just an open relay on the net.

For the "free signaling server" to be a good solution it would first have to handle the problem of proven identity, which is something that even facebook with billions of users have a problem with.

DHT solves a very different problem than identity, the problem is not being able to speak/address to a user, the problem is being able to speak to the right user. WebRTC provides a channel to do that if you point it to the right user. Our problem is finding that right user in a secure, smooth way.


You can signal in more ways than this: email, QR code or any method that can transfer a small amount of data [1]. You do need to think of the security of this channel though as it forms the basis of authenticating the other party through the fingerprint attribute in the signaled SDP payload [2].

[1] https://developer.mozilla.org/en-US/docs/Web/API/WebRTC_API/...

[2] https://rtcweb-wg.github.io/security-arch/


I did it via WhatsApp and copying data from TextAreas the first time I played around with it.


For my project that has a server side peer connection, I use a simple HTTP POST from the browser via fetch with the offer that responds with the answer. Then as I need renegotiation, I send the offer from the server side over the data channel and wait for the client side answer back over the data channel. Works fine.


> Obviously, one would want the data channel (last option). There's some work done in this area. However, it's still not full featured.

How would that even work? You need signaling to set up the data channel, so how could you handle signaling over a channel that isn't set up?


In a 1 to 1 setting, it obviously doesn't work. But in a mesh setting with n connected peers, when you want to add a new peer, you can perform the first signaling step via a server to connect with one member of the mesh, and then do all the subsequent signaling through this peer. This way, to set-up a mesh with n users you only need n-1 signaling messages processes by your server (down from n(n-1)/2 if you perform everything through a centralized signaling server).

In practice, I know nobody who does that though. I don't think the extra complexity is worth it if your mesh are never really big, and with WebRTC, you rarely encounter situations where you have more than a few peers in the same mesh (Google Chrome even used to struggle a lot if you had more than a few dataChannel opened on the same page, while Firefox handled hundreds of connexions without issue).


Ah, yeah, I was thinking of the initial connection.

I guess the mesh needs to be pretty small so that you can have a fully connected mesh or you'd need some way to deal with netsplits and a gossip-style protocol for discovery of new nodes, right?


> WebRTC's Data Channel

Whoops turtles..


How's the native Android / iOS integration story for Pion? Are there any Java / Kotlin / Swift bindings around?


I have the same question. If it doesn't work on iOS it's not a cross platform stack. iOS is the third most popular operating system in the world after Android and Windows.


Looks cool. Does it have the capability to do SFU or is there planned work to include that capability ?


They have this SFU example but I didn't look at it closely yet: https://github.com/pion/example-webrtc-applications/tree/mas...


I'd like to stream high-quality, low-latency audio from a C++ app alongside webcam/mic from chrome/firefox. The receiver should get video and a mixed blended audio. Maybe even screenshare and mouse/keyboard control. Is this possible?


Yes. Maybe look at Gstreamer.


Has anyone noticed "Modern and Cross Platform" usually means Golang?


This industry is insanely fad driven. Every aspect of it. Terminology, tech stacks and methodologies are all one fad after another.


Lots of products, including ours, running on Go. It's not a fad, it's a useful tool. Not the best programming language in the world, but still allows you to produce great results in a short time. Kind of like Delphi what used to be for the Windows desktop.


I’m not sure Go is what I’d call a fad.

Is Java a fad? Its early years were much more hype driven than Go has been at any time during the 10 years Go has been around.

Python? JavaScript? Ruby? Rust?

What I like about the Go community is that it appears to be very measured. Unlike many other communities where the way a language is used becomes bigger than the language. Like Spring and Java. Rails and Ruby.

Or Rust and “look I reimplement X despite nobody giving two shits”. :-)

(My big disappointment with Rust is that it hasn’t found its niche. I’d like that niche to be embedded operating systems. We could certainly use Rust there, because C, C++ and the utterly shit Python-junk used to stich things together is just painful . Competing for attention in server development or any other area that has very strong contenders is clearly an uphill battle)


Go is over a decade old, not counting a couple of years of development in private. I'm not the biggest fan of the language but it's hard to think of it as a fad.


Fad or not, in terms of programming languages, ten years isn't such a long time. It typically takes a few years just to reach the eco system maturity and community size needed to become a popular language. Rust is also ca. ten years old and I'd argue that it still haven't reached its full potential.


The language absolutely was a fad for a while. The shiny new thing everyone was moving to. That is how languages get popular. But languages unlike terminology don't just disappear.


> Ship to Mobile, Desktop, Servers and WASM all with one code base.

Dumb question: what would a WASM implementation be useful for? Browsers don't expose "raw" TCP or UDP connections, so even if you have a fully working WebRTC implementation in WASM, there would be no way to connect to anything when ran inside a browser.

Or is this for WASM use outside the browser?


People are building server-side WASM runtimes, which I suspect is your answer. To me the other answer is they could provide the same API in the browser through a shim.



> they could provide the same API in the browser through a shim.

Oh, so in the browser, it would just fall back to the native WebRTC implementation? That would make sense.


What are the best resources to learning WebRTC at the minute?

It seems to be a pretty fast moving area, a lot of what I find out there is outdated.


MDN docs are very good.

https://developer.mozilla.org/en-US/docs/Web/API/WebRTC_API

For landscape, I also like the webrtchacks.com blog, and a very smart consultant runs a blog at bloggeek.me.


I wonder how go garbage collection and real time data streams get along together.


Are there any web rtc implementations with Java?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: