> This isn't an adequat comparison in my eyes. Your example moves a very critical component out if the "protocol logic": message parsing.
(for context in case you didn't see the link in the other comments - https://gist.github.com/joshka/af299be87dbd1f64060e47227b577... is the full working implementation of this). It uses the tokio framed codec approach where message parsing effectively only succeeds once the full message is parsed. For stun that doesn't really seem like a big deal, but I can imagine that for other protos it might be.
> Dealing with invalid messages is crucial in network protocols, meaning the API should be `&[u8]` instead of parsed messages.
The stream is a stream of Result<ParsedMessage, ErrorMessage>, that error message is able to represent the invalid message in whatever fashion makes sense for the protocol. The framed approach to this suggests that message framing is more important to handle as a more simple effectively linear state machine of bytes. Nothing is stopping you from treating the smallest unit of the protocol as being a byte, and handling that in the protocol handler though, still just comes down to being a stream transformation from the bytes to a stream of messages or errors..
Additionally, when working with the stream approach, there's usually a way to mutate the codec (e.g. UdpFramed::codec_mut()) which allows protocol level information to inform the codec how to treat the network byte stream if necessary.
> In addition, you want to be able to parse messages without copying, which will introduce lifetimes into this that will make it difficult to use an exexutor like tokio to run this future. You can use a structured-concurrency approach instead. I linked to an article from withoutboats about this at the end.
The Bytes crate handles zero-copy part of this orthogonally. Lifetimes aren't necessary. PoC on this is to just add a bytes field and push the ref to the bytes representing the stun response like:
let bytes = data.split_to(12).freeze();
Ok(Some(BindingResponse { address, bytes }))
> Lastly, and this is where using async falls apart: If you try to model these state machines with async, you can't use `&mut` to update shared state between them AND run multiple of these concurrently.
I'm not sure that I see your point here. I think it's just you never have to think about concurrent access if there's no concurrency, but I'm sure I'm misunderstanding this somehow. Maybe you could expand on this a little?
> EDIT: Just to be clear, I'd love to use the Rust compiler to generate these state machines but it simply can't do what I want (yet). I wrote some pseudo-code somewhere else in this thread of how this could be done using co-routines. That gets us closer to this but I've not seen any movement on the front of stabilising coroutines.
I wonder if you'd be able to expand on an example where these constraints you mentioned above come up. The Stun example is simple enough that it is achievable just with an async state machine. I'd be curious to work through a more detailed example (mostly for interests sake).
In the article you mention the runtime overhead of mutexes / channels. I'm curious what sort of percentage of the full system time this occupies.
> The stream is a stream of Result<ParsedMessage, ErrorMessage>
Fair point! Yeah that would allow error handling the state machine.
> > Lastly, and this is where using async falls apart: If you try to model these state machines with async, you can't use `&mut` to update shared state between them AND run multiple of these concurrently.
> I'm not sure that I see your point here. I think it's just you never have to think about concurrent access if there's no concurrency, but I'm sure I'm misunderstanding this somehow. Maybe you could expand on this a little?
1. Run N clients of the TURN protocol.
2. Run M connections (ICE agent + WireGuard tunnel).
3. Allow each connection to use each TURN client.
4. Everything must run over a single UDP socket (otherwise hole-punching doesn't work).
In theory, (3) could be relaxed a bit to not be a NxM matrix. It doesn't really change much though because one TURN allocation (i.e. remote address) needs to be usable by multiple connections.
All of the above is managed as a single `Node` that is composed into a larger state machine that manages ACLs and uses this library to route IP packets to and from a TUN device. That is where the concurrent access comes in: We receive ACL updates via a WebSocket and change the routing logic as a result. Concurrently, we need to read from the TUN device and work out which WireGuard tunnel the packets needs to go through. A WireGuard tunnel may use one of the TURN clients to relay the packet in case a direct connection isn't possible. Lastly, we also need to read from the UDP socket, index into the correct connection based on the IP and decrypt incoming IP packets.
In summary, there are lots of state updates happening from multiple events and the state cannot be isolated into a single task where `&mut` would be an option. Instead, we need `&mut` access to pretty much everything upon each UDP packet from the network or IP packet from the TUN device.
> The Stun example is simple enough that it is achievable just with an async state machine.
I am fully aware of that. It needed to be simple to fit into the scope of a blog post that can be understood by people without a deep networking background. To explain sans-IO, I deliberately wanted a simple example so I could focus on sans-IO itself and not explaining the problem domain. With the basics in place, people can work out by themselves whether or not it is a good fit for their problem. Chances are, if they are struggling with lots of shared state between tasks, it could be!
Thinking about your example again made me think of another requirement: Multiplexing.
STUN ties requests and responses together with a transaction ID that is encoded as an attribute in the message.
Modelling this using `Sink`s and `Stream`s is quite difficult. Sending out the request via a `Sink` is easy but how do you correctly dispatch the incoming response again?
For example, let's say you are running 3 TURN clients concurrently in 3 async tasks that use the referenced `Sink` + `Stream` abstractions. How do you go from reading from a single UDP socket to 3 streams that each contain the correct responses based on the transaction ID that was sent out? To read the transaction ID, you'd have to parse the message. We've established that we can do that partially outside and only send the `Result` in. But in addition to the 3 tasks, we'd need an orchestrating task that keeps a temporary state of sent transaction IDs (received via one of the `Sink`s) and then picks the correct `Stream` for the response.
It is doable but creates the kind of spaghetti code of channels where concerns are implemented in many different parts of the code. It also makes these things harder to test because I now need to wire up this orchestrating task with the TURN clients themselves to ensure this response mapping works correctly.
The way I look at async in this case is that it kind of does the opposite of the inversion of control (in the original sense[1], not the Java/C# IoC container sense) that you're doing here. Your approach replaces a linear method with a driver that calls into a struct that explodes out each line of the method. An async method, basically does the same, but it means your logic doesn't have to handle the polling, orchestration, selection, ordering, etc. manually. It doesn't necessarily mean however you have to throw away your full state machine / structs and implement everything in a single method.
I don't think UdpFramed would fully work in your example, as it only works in terms of a single destination address for sending. I'd write a similar FramedMultiplexer implementation that combines a stream and a sink. I think this is probably what Node[1] already is effectively.
The stream is of either bytes / packets, or properly decoded messages (e.g. `enum Message { WireguardMessage, TurnMessage, StunMessage, IceMessage }`)
The sink is a of a tuple `(SocketAddr, Message)` (i.e. effectively your Transmit struct), and which is hooked up externally to a UdpSocket. It basically aggregates the output of the all the allocation / agent / connections.
The *_try_handle stuff doesn't really change much - it's still a lookup in a map to choose which allocation / agent / connection to write to.
> How do you go from reading from a single UDP socket to 3 streams that each contain the correct responses based on the transaction ID that was sent out
I think right now you're sending the response to the allocation that matches the server address[3] and then checking whether that allocation has the transaction id stored[4]. I guess you miss the ability to return false if the input to the allocation is a stream rather than just a method, but that doesn't preclude writing a function that's effectively the transaction check part of allocation, while still having your allocation inner loop just be read from stream, write to sink. (if transaction is one we sent, send the message to the allocation's task and let it handle it otherwise return false
> It is doable but creates the kind of spaghetti code of channels where concerns are implemented in many different parts of the code. It also makes these things harder to test because I now need to wire up this orchestrating task with the TURN clients themselves to ensure this response mapping works correctly.
I read through the node code and it feels a bit spaghetti to me as an outsider, because of the sans-io abstractions, not inspite of it. That comes from nature of the driving code being external to the code implementing the parts.
The counter point to the testing is that you're already doing this orchestration and need to test it. I'm not sure why having any of this as async changes that. The testing should be effectively "If a turn message comes in, then the right turn client sees the message" - that's a test you'd need to write regardless right?
[1]: https://en.wikipedia.org/wiki/Inversion_of_control (Incidentally, there's a line in the article where you call out the dependency inversion principle, but this is a bit more accurate a description of what's happening - they are related however).
> An async method, basically does the same, but it means your logic doesn't have to handle the polling, orchestration, selection, ordering, etc.
In order to take advantage of that, I'd need to create one `async` method for each handshake with the TURN server, right? Like every `allocate`, every `refresh`, every `bind_channel` would need to be its own async function that takes in a `Sink` and `Stream` in order to get the benefit of having the Rust compiler generate the suspense-point for me upon waiting on the response. Who is driving these futures? Would you expect these to be be spawned into a runtime or would you use structured concurrency a la `FuturesUnordered`?
A single `Allocation` sends multiple of these requests and in the case of `bind_channel` concurrently. How would I initialise them? As far as I understand, I would need to be able to "split" an existing `Stream` + `Sink` pair by inserting another multiplexer in order to get a `Stream` that will only return responses for the messages that have been sent via its `Sink`, right?
I just don't see the appeal of all this infrastructure code to "gain" the code-gen that the Rust compiler can do for these request-response protocols. It took my a while to wrap my head around how `Node` should work but the state isn't that difficult to manage because it is all just request-response protocols. If the protocols would have more steps (i.e. more `.await` points in an async style), then it might get cumbersome and we'd need to think about an alternative.
> > It is doable but creates the kind of spaghetti code of channels where concerns are implemented in many different parts of the code. It also makes these things harder to test because I now need to wire up this orchestrating task with the TURN clients themselves to ensure this response mapping works correctly.
> I read through the node code and it feels a bit spaghetti to me as an outsider, because of the sans-io abstractions, not inspite of it.
> That comes from nature of the driving code being external to the code implementing the parts.
It is interesting you say that. I've found that I never need to think about the driving code because it is infrastructure that is handled on a completely different layer. Instead, I can focus on the logic I want to implement.
The aspect I really like about the structure is that the call-stack in the source code corresponds to the data flow at runtime. That means I can mentally trace a packet via "jump to definition" in my editor. To me, that is the opposite of spaghetti code. Channels and tasks obfuscate the data flow and you end up jumping between definition site of a channel, tracing where it was initialised, followed by navigating to where the other end is handled etc.
(for context in case you didn't see the link in the other comments - https://gist.github.com/joshka/af299be87dbd1f64060e47227b577... is the full working implementation of this). It uses the tokio framed codec approach where message parsing effectively only succeeds once the full message is parsed. For stun that doesn't really seem like a big deal, but I can imagine that for other protos it might be.
> Dealing with invalid messages is crucial in network protocols, meaning the API should be `&[u8]` instead of parsed messages.
The stream is a stream of Result<ParsedMessage, ErrorMessage>, that error message is able to represent the invalid message in whatever fashion makes sense for the protocol. The framed approach to this suggests that message framing is more important to handle as a more simple effectively linear state machine of bytes. Nothing is stopping you from treating the smallest unit of the protocol as being a byte, and handling that in the protocol handler though, still just comes down to being a stream transformation from the bytes to a stream of messages or errors..
Additionally, when working with the stream approach, there's usually a way to mutate the codec (e.g. UdpFramed::codec_mut()) which allows protocol level information to inform the codec how to treat the network byte stream if necessary.
> In addition, you want to be able to parse messages without copying, which will introduce lifetimes into this that will make it difficult to use an exexutor like tokio to run this future. You can use a structured-concurrency approach instead. I linked to an article from withoutboats about this at the end.
The Bytes crate handles zero-copy part of this orthogonally. Lifetimes aren't necessary. PoC on this is to just add a bytes field and push the ref to the bytes representing the stun response like:
> Lastly, and this is where using async falls apart: If you try to model these state machines with async, you can't use `&mut` to update shared state between them AND run multiple of these concurrently.I'm not sure that I see your point here. I think it's just you never have to think about concurrent access if there's no concurrency, but I'm sure I'm misunderstanding this somehow. Maybe you could expand on this a little?
> EDIT: Just to be clear, I'd love to use the Rust compiler to generate these state machines but it simply can't do what I want (yet). I wrote some pseudo-code somewhere else in this thread of how this could be done using co-routines. That gets us closer to this but I've not seen any movement on the front of stabilising coroutines.
I wonder if you'd be able to expand on an example where these constraints you mentioned above come up. The Stun example is simple enough that it is achievable just with an async state machine. I'd be curious to work through a more detailed example (mostly for interests sake).
In the article you mention the runtime overhead of mutexes / channels. I'm curious what sort of percentage of the full system time this occupies.