Reading through the paper I can't see what the authors mean with "reliable transmission" there, and how they achieve it.
I only see them referencing having increased socket buffers, which then lead - in combination with the available (and non-congested) network bandwidth and their app sending behavior - to no transmission errors. As soon as you change any of those parameters it seems like the system would break down, and they have absolutely no measures in place to "make it reliable".
The right answer still seems: Implement a congestion controller, retransmits, etc. - which essentially ends up in implementing TCP/SCTP/QUIC/etc
Having end-to-end control of their topology in production is the measure they're using to make it reliable. Since they're saturating the link the receiver parameters are reasonably robust, the sender physically cannot burst any faster and overrun the receiver.
Retransmit-based systems are probably unusable in this application, even over the short hop the bandwidth-delay product is probably much bigger than the buffer on the sensor. The only case where retransmit would be happen is receiver buffer overflow, which is catastrophic: the retransmit would cause even more overflow.
If you had to fix random packet loss in a system like this you wouldn't want to use retransmission, you'd need to do FEC.
If you have a very low error rate line, the main point at which packet loss will occur for UDP is on the receiving system. If the receive buffer size is not large enough, it is possible that it can get filled up while the receiving app is doing other things, and then packets will be dropped.
Yes but they didn't do anything to make UDP reliable they just said in our test scenario we didn't notice any loss at the application layer after increasing the socket receive buffer and called it a day because elsewhere in the paper they noted "For some detector readout it is not even evident that guaranteed delivery is necessary. In one detector prototype we discarded around 24% of the data due to threshold suppression, so spending extra time making an occasional retransmission may not be worth the added complexity."
I think the paper meant "reliable" in a different way than most would take "reliable" to mean on a paper about networking similar to if someone created a paper about "Achieving an asynchronous database for timekeeping" and spent a lot of time talking about databases in the paper but it turns out by "asynchronous" they meant you could enter your hours at the end of the week rather than the moment you walked in/out of the door.
I just think they meant reliable in a 'how to dimension to greatly reduce the possible loss'. No protocol is 'fully' reliable in all dimensions (latency, message loss, throughput). Sometimes you benchmark your exact physical conf and you add large margins, add some packet loss detection mechanisms, eventually retries (but if your latency requirements are hard no dice) or duplicate the physical layer (oh god, de-duplication at 10GbE...) or just accept some losses.
Reliability in the context of networking protocols means a specific thing to me - guaranteeing packet delivery (to the extent that it is physically possible of course).
If the authors of the paper are using a term that already has a specific meaning in the area they are working in, but meaning something different from that then they are making a mistake.
Sounds like they got the wrong protocol for it then. UDP is not meant for "reliable". It's send and forget. Not sure why anyone would implement TCP on top of UDP.
I only see them referencing having increased socket buffers, which then lead - in combination with the available (and non-congested) network bandwidth and their app sending behavior - to no transmission errors. As soon as you change any of those parameters it seems like the system would break down, and they have absolutely no measures in place to "make it reliable".
The right answer still seems: Implement a congestion controller, retransmits, etc. - which essentially ends up in implementing TCP/SCTP/QUIC/etc