Reading through the paper I can't see what the authors mean with "reliable trans...

bcoates · on April 19, 2020

Having end-to-end control of their topology in production is the measure they're using to make it reliable. Since they're saturating the link the receiver parameters are reasonably robust, the sender physically cannot burst any faster and overrun the receiver.

Retransmit-based systems are probably unusable in this application, even over the short hop the bandwidth-delay product is probably much bigger than the buffer on the sensor. The only case where retransmit would be happen is receiver buffer overflow, which is catastrophic: the retransmit would cause even more overflow.

If you had to fix random packet loss in a system like this you wouldn't want to use retransmission, you'd need to do FEC.

aDfbrtVt · on April 19, 2020

EPON already includes a RS(255,223) ECC scheme as part of the standard.

tomohawk · on April 19, 2020

If you have a very low error rate line, the main point at which packet loss will occur for UDP is on the receiving system. If the receive buffer size is not large enough, it is possible that it can get filled up while the receiving app is doing other things, and then packets will be dropped.

rubatuga · on April 19, 2020

They want reliable UDP, not TCP. They state that very clearly.

zamadatix · on April 19, 2020

Yes but they didn't do anything to make UDP reliable they just said in our test scenario we didn't notice any loss at the application layer after increasing the socket receive buffer and called it a day because elsewhere in the paper they noted "For some detector readout it is not even evident that guaranteed delivery is necessary. In one detector prototype we discarded around 24% of the data due to threshold suppression, so spending extra time making an occasional retransmission may not be worth the added complexity."

I think the paper meant "reliable" in a different way than most would take "reliable" to mean on a paper about networking similar to if someone created a paper about "Achieving an asynchronous database for timekeeping" and spent a lot of time talking about databases in the paper but it turns out by "asynchronous" they meant you could enter your hours at the end of the week rather than the moment you walked in/out of the door.

touisteur · on April 20, 2020

I just think they meant reliable in a 'how to dimension to greatly reduce the possible loss'. No protocol is 'fully' reliable in all dimensions (latency, message loss, throughput). Sometimes you benchmark your exact physical conf and you add large margins, add some packet loss detection mechanisms, eventually retries (but if your latency requirements are hard no dice) or duplicate the physical layer (oh god, de-duplication at 10GbE...) or just accept some losses.

I just meant 'reliable is a spectrum'...

p1necone · on April 20, 2020

Reliability in the context of networking protocols means a specific thing to me - guaranteeing packet delivery (to the extent that it is physically possible of course).

This does seem to be a technical term with a defined meaning that matches my assumption too: https://en.wikipedia.org/wiki/Reliability_(computer_networki...

wumpus · on April 20, 2020

Try considering what the authors of the paper mean.

p1necone · on April 20, 2020

If the authors of the paper are using a term that already has a specific meaning in the area they are working in, but meaning something different from that then they are making a mistake.

zelphirkalt · on April 20, 2020

Sounds like they got the wrong protocol for it then. UDP is not meant for "reliable". It's send and forget. Not sure why anyone would implement TCP on top of UDP.