Speedbump – a TCP proxy to simulate variable network latency

diggan · on Jan 16, 2024

I looked into doing something similar for testing various ActivityPub implementations in various network sizes and conditions.

Turns out, I already had everything installed on my machine to do it, via `tc` (Explained a bit here: https://wiki.archlinux.org/title/advanced_traffic_control), which apparently came with the iproute2 package on my distribution.

With that, you'd be able to run something like this to add latency on a specific interface:

    tc qdisc add dev eth0 root netem delay 100ms

Really easy to use, works well in docker container too, comes with a bunch of different conditions you can apply (delay, packet loss, duplication) and you might just have it installed already.

camtarn · on Jan 16, 2024

Yeah, tc/netem/tbf is amazing. I built a simple Python GUI over the top of it and ran it on a Pi in a touchscreen case - just a little black box with "Drop packets: [0%] [1%] [10%] [50%] / Corrupt packets: ..." etc. Clients were very impressed.

I'm actually quite surprised that a similar frontend doesn't seem to exist as a commercial hardware product, unless we missed one in our search.

BiteCode_dev · on Jan 16, 2024

You have a product here.

lifeisstillgood · on Jan 16, 2024

Yeah - I was looking around this week and think there is a “gap” for a good simple proxy - speed bump is nice but I am thinking just let my application do it’s thing, then let me replay certain traffic, that JSON needs to

I think a programmable proxy with simple hooks is what I want

jandrese · on Jan 16, 2024

The downside of tc is that it is kinda weird and tricky to get it to operate on incoming packets.

I had to write my own emulator once to simulate a specific commercial satellite terminal. The terminal had the behavior of queuing up packets until they hit a certain threshold or exceeded a time limit and then bursting them out in a big blob. It also "helpfully" reordered small packets to the front of the queue for better latency, which made TCP stacks very cross.

bogomipz · on Jan 17, 2024

I thought tc only worked on outbound. How did you get it to work on inbound packets?

jandrese · on Jan 17, 2024

There is a way to do it by setting up another queue and redirecting inbound packets into that queue, but I don't remember the specifics. I do recall quite a bit of trouble getting it to work. If you search for it you should be able to find some topics on the subject.

lttlrck · on Jan 16, 2024

The nice thing about speedbump is the ability to modulate the impairments. tc cannot do that.

That could be really useful for maybe simulating weather effects on satellite/RF over time?

tussa · on Jan 16, 2024

tc can simulate various network issues: https://man7.org/linux/man-pages/man8/tc-netem.8.html

camtarn · on Jan 16, 2024

You could easily build something that could repeatedly call the tc command and feed it different levels of impairment. Doing so to simulate weather is a great idea. If you had a wave-rider buoy measuring wave height every half-second or so, you could even simulate the impairment of an RF connection over water due to waves.

jedberg · on Jan 16, 2024

This is what we built at Netflix and called latency monkey.

It turns out determining that a downstream service is slow is a lot harder than determining if it's unavailable, so it was an important way for us to test how services handle slowdowns and network problems as well.

It was really simple, it just dropped some configureable percentage of packets, which would force a resend, causing the other side to get packets delayed and out of order.

It ended up finding a lot of problems in our error handling code for network access.

NelsonMinar · on Jan 16, 2024

Every software engineer working on interactive Internet applications should be required to use a tool like this in their daily work. Need QUIC as well as TCP, ideally all UDP to catch DNS too.

I'm convinced 90% of webapp bloat would go away if the people building them didn't have a gold plated Cadillac computing experience.

galleywest200 · on Jan 16, 2024

You can do this in Firefox with their Development Tools! I am sure Chrome has some form of equivalent in its Development Tools.

https://firefox-source-docs.mozilla.org/devtools-user/networ...

Admittedly this only works for front-end, browser based testing.

Flimm · on Jan 16, 2024

This only throttles the speed, it doesn't drop packets, AFAIK.

hunter2_ · on Jan 16, 2024

In a browser context, doesn't the HTTP stack sort out any issues in lower layers such as those stemming from packet loss (e.g., packets arriving out of order) such that what's presented to the client app (and debuggable by the developer thereof) is just an HTTP response or lack thereof? That lack thereof (or additional latency) could be due to packet loss, but it doesn't have to be, and it doesn't really matter because it's abstracted away.

burntcaramel · on Jan 17, 2024

The problem is it throttles everything by a constant factor. So if at full speed response A arrived after response B, it still most likely will, they will all just arrive slower.

It’s like the safety car in Formula 1: the cars will all slow down to crawl, but they won’t change order because they can’t overtake.

Better to test a scenario like in F1 where one car hits another, and then takes a few more out with them. That will really scatter the order and timings of things. That’s more how the real world of Wi-Fi and cellular connections works, so being able to handle that level of unpredictability is going to lead to way more robust apps and a better UX.

I have a dream of opening a co-working space for developers where you can connect to “Free Airport Wi-Fi” to really test your web application is going to survive in the real world.

youngtaff · on Jan 16, 2024

Chrome throttle’s at the individual request level, and not at the network level so can’t simulated loss due to contention e.g. downloading a multi-MB page over a 3G connection

irisgrunn · on Jan 16, 2024

Yeah, Chrome has the same feature

westurner · on Jan 16, 2024

Very many apps poorly perform with intermittent network connectivity, in Diaster Relief scenarios.

More app developers could help others by testing with simulated intermittent connectivity.

From "Toxiproxy is a framework for simulating network conditions" (2021) https://news.ycombinator.com/item?id=29084277#29088775 :

> Many apps lack 'pending in outbox' functionality that we expect from e.g. email clients.

> - [ ] Who could develop a set of reference toxiproxy 'test case mutators' (?) for simulating typical #DisasterRelief connectivity issues?

withinboredom · on Jan 16, 2024

My favorite is not filling buffers before sending the packet and then everything with a crappy internet connection (dropped packets + high latency = TCP retransmissions) is suddenly doing 120kps because you're only sending 50 byte packets and every 10th one is getting lost. Meanwhile, that's a thread on your server not doing useful work.

apitman · on Jan 16, 2024

Would you mind describing this problem more fully? I'm currently learning TCP more deeply and this sounds interesting but I don't quite understand the issue.

withinboredom · on Jan 17, 2024

There are two choice you have when sending data:

1. Wait until you have enough data to send in at least one packet (maximize throughput).

2. Send data as soon as you have it, even if it is less than a packet's worth. (minimize latency).

If you want to send a file, for example, you should use the first choice. A common or naive implementation might be to read the file in chunks of 9kb (about the size of a jumbo frame), send it to the socket, and then flush it. There are multiple problems with that approach.

1. Not everyone has jumbo frames, so if we had a common MTU of 1500 bytes on our link, that means you'd actually send ~6.25 packets worth of data.

2. Even if you were using jumbo frames, TCP headers use up that space. So, if you flush the entire frame's worth of data, you'd send something like 1.1 packets.

3. The filesystem might or might not give you the maximum bytes you request. If there is i/o pressure, you might only get back some random amount instead of the 9k you requested. So, if you flush the buffer to the TCP stack, you'd only send that random amount instead of what you assumed would be 9kb.

These mistakes generally send lots of small packets very quickly, which is fine when the link is fat and short. As soon as one packet gets lost, you're still sending a bunch of small packets, but now we have to stop and wait for the lost packet again before we can start handling more of them. So, if you are sending to someone on, say, airport/cafe wifi, they will have atrocious download speeds even though they have a pretty fat link (the amount of retransmissions due to wifi interference is a large % of the bandwidth). This is very similar to "head of line blocking" but on the link level.

I only had to learn about this fairly recently because my home is surrounded by a literal ton of wifi access points (over 20!) which causes quite a bit of interference. On wifi, I can get around 800mbps on the link but about every 10th packet needs to be retransmitted due to having so many access points around me (not to mention my region uses most of the 5G band for airport radar, so there are only a few channels available in that band). So, when applications/servers don't fill the buffers, I get about 50kpbs. When they do fill the buffers, I can get around 300mbps maximum, even though I have an 800mbps link.

Hope that helps.

westurner · on Jan 18, 2024

From "MOOC: Reducing Internet Latency: Why and How" https://news.ycombinator.com/item?id=37285586#37285830 : sqm-autorate, speedtest-netperf, netperf/iperf, flent dslreports_8dn

Would there be a benefit to autotuning with e.g. sqm-autorate in the suggested environments?

walth · on Jan 16, 2024

You can do the same thing on Mac with built-in tools

``` # Setup pipe sudo dnctl pipe 1 config bw 1Kbit/s delay 800

# Setup matching pf rule echo "dummynet out proto tcp from any to 127.0.0.1 port 11211 pipe 1" | sudo pfctl -f -

# Turn on firewall sudo pfctl -e

# Test time nc -vz 127.0.0.1 11211 Connection to 127.0.0.1 port 11211 [tcp/*] succeeded! nc -vz 127.0.0.1 11211 0.01s user 0.00s system 0% cpu 1.333 total ```

rhizome · on Jan 16, 2024

Dummynet and the rest of this functionality comes from FreeBSD, where it has existed for a long time. I was doing packet loss testing with it 15+ years ago. Works great!

jmspring · on Jan 16, 2024

It has been inactive for awhile, but the name says a lot - https://github.com/tylertreat/comcast

kayyyy · on Jan 16, 2024

amazing name

girishso · on Jan 16, 2024

Recently wanted to simulate slow network on Mac and came across Network Link Conditioner, it’s pretty good. No need to setup proxy or anything else. It needs to be installed from Xcode additional tools.

https://nshipster.com/network-link-conditioner/

Uptrenda · on Jan 16, 2024

Checkout also shopify's awesome tool called toxiproxy: https://github.com/Shopify/toxiproxy

It turns out to be also a very good way to test a networking library by implementing it. Since your stack needs to be able to basically handle most adverse events properly.

The idea behind 'chaos engineering' is cool.

sph · on Jan 16, 2024

Yes, I found toxiproxy first, but the client-server model didn't suit me, while speedbump is perfect for my use case — simulating HTTP latency.

I am developing a progress bar for a web crawler, and testing on localhost is too fast to notice if there is any issue. With speedbump, I just do `podman run --net=host kffl/speedbump:latest --latency=1s --port=8001 localhost:8000` and test my crawler on http://localhost:8001

Neat tool.

miry · on Jan 16, 2024

I believe a similar approach could be achieved using toxiproxy and configuration. The documentation at https://github.com/Shopify/toxiproxy#2-populating-toxiproxy provides insights into the process. While it may not be as straightforward as with speedbump, the flexibility of toxiproxy makes it a viable option.

Toxiproxy primarily serves the purpose of integration into tests. It proves particularly useful when testing features like the progress bar directly from code. Notably, for Go applications, integration is seamless, eliminating the need for an additional application. Instead, you can run the Toxiproxy server part directly from the codebase. An illustrative example can be found at https://github.com/Shopify/toxiproxy/blob/main/_examples/tes....

RajT88 · on Jan 16, 2024

A similar tool for Windows I have used:

https://jagt.github.io/clumsy/

baq · on Jan 16, 2024

+1, used it like a decade ago to test out various intercontinental scenarios, results matched reality, would recommend.

ComputerGuru · on Jan 16, 2024

Looks cool but judging purely by the screenshot it seems to be systemwide w/ filters rather than per-adapter?

RajT88 · on Jan 16, 2024

Correct, but you can use filters to determine which packets it targets. Source IP is one of the filter criteria, so presumably that would get you your adapter targeting.

wang_li · on Jan 16, 2024

FreeBSD also has dummynet as part of ipfw that allows injection of latency, bandwidth caps, queue sizes, and packet loss. Same thing as in MacOS.

e145bc455f1 · on Jan 16, 2024

Is that the same as tc in linux?

wang_li · on Jan 17, 2024

Not familiar with tc, but from an online search they seem similar.

INTPenis · on Jan 16, 2024

I'll never forget at my first job back in 2004 my manager configured our FreeBSD IPFW firewall so that it slowed down ICMP responses. So whenever someone pinged us it looked like we had the highest response times.

He was a joker that guy.

ethomson · on Jan 17, 2024

This is delightful and I can't wait to try it out. Right now, the libgit2 project (https://github.com/libgit2/libgit2) has a custom HTTP git server wrapper that will throttle the responses down to a very slow rate. It's fun watching a `git clone` running over 2400 baud modem speeds, but it's actually been incredibly helpful for testing timeouts, odd buffering problems, and other things that crop up in weird network environments.

I'd love to jettison our hacky custom code and use something off-the-shelf instead.

londons_explore · on Jan 16, 2024

I'd like this to go one layer deeper... Simulate a network with a variable latency, and see how typical TCP implementations behave over it.

Far too many realtime things, like games and videoconferencing (both of which frequently use TCP, despite it being badly suited to the application), perform really badly when the bandwidth increases and decreases, for example as I walk around a building with wifi and pass concrete pillars.

I want not a single dropped frame in those circumstances. Sure - you can have some lower res frames, but I don't expect to see 30 frames dropped in a row and a big glitch.

jsnell · on Jan 16, 2024

If you're specifically looking to probe the behavior of different TCP implementations, I think you want something slightly different than this. First, you want the faults to be deterministic (e.g. not random 1% packet loss but packet loss at a certain point in the connection); and you want the limits to be per TCP connection, not per link. If you don't have that determinism or connection isolation there are just going to be too many confounding variables in addition to the TCP stacks you're supposed to be testing.

I wrote a tool like that a few years back[0], but didn't publish most of what I discovered about e.g. how different CDNs tuned their TCP stacks (just a couple of anonymized examples[1]).

[0] https://github.com/teclo/flow-disruptor

[1] https://www.snellman.net/blog/archive/2015-10-01-flow-disrup...

ajsnigrutin · on Jan 16, 2024

You can do this with a linux router and 'tc'

https://www.pico.net/kb/how-can-i-simulate-delayed-and-dropp...

(just a short example)

laserbeam · on Jan 16, 2024

Can't you just do it locally with tc without a router? I have artificially throttled / simulated delays with tc before and I don't see why you need a router.

PS. My knowledge of tc is very rudimentary. I only had to use it once.

jrockway · on Jan 16, 2024

You don't need a router. I've used it on `lo` before.

derhuerst · on Jan 16, 2024

AFAIK mininet [0] can help you with this. It allows configuring latency and/or jitter dynamically using Python [1].

[0] https://mininet.org [1] https://mininet.org/api/classmininet_1_1link_1_1TCIntf.html#...

lelanthran · on Jan 16, 2024

> I want not a single dropped frame in those circumstances. Sure - you can have some lower res frames, but I don't expect to see 30 frames dropped in a row and a big glitch.

This is quite difficult, I think. You only know how bad the connection is after dropping a frame or two, and the intervening network will detect it before the endpoints do.

From an application level, I suspect that on the socket layer level you can't do anything about TCP anyway, the underlying layer will retransmit after a delay.

[EDIT: Just realised you meant video frames, and not network frames. I think my comment still applies though - dropping a single transmission unit that is part of an I video frame still means discarding that entire frame, whilst dropping a B or P frame is going to result in glitches.]

londons_explore · on Jan 16, 2024

With closer integration between the application and the network layers, you can see things like the RSSI dropping (basically signal strength). This can trigger the application to start sending less data before any actual data loss happens - and the application at the local end can respond in under a millisecond rather than a whole network roundtrip.

For the traffic from the remote end, data packets can consist of "here is the low res data" and another packet for "and here is the extra data to turn the low res into higher res". The high res would be sent with QoS markers so that network hardware knows that if anything is to be dropped, drop that first. The different QoS streams could even be sent with different transmit powers and modulation schemes on the WiFi network.

withinboredom · on Jan 16, 2024

RSSI goes up and down all the time for a variety of reasons. That's not really an indication of network speed though, only link speed. You can have a wifi connection over several km's if nobody else has wifi routers (in real life, they do, so you can't go much further than your house) and still have a really good link speed (from experience).

Network speed is a function of link speed AND the success of packets to reach you. When you are on wifi, your device gets a very tiny slice of time to send packets. If you have more devices, your device gets less time to send packets. Further, a "loud router" nearby (or self-interference ... or radar in 5G space -- or a microwave, or Bluetooth in 2G space) can cause a frame to be interfered with and never be received. Thus it has to be retransmitted.

RSSI is only a very small part of your overall network conditions and doesn't really tell anyone anything.

ysleepy · on Jan 16, 2024

IIRC FreeBSD had dummynet which allows creating slow, laggy and lossy virtual networks.

jrockway · on Jan 16, 2024

This little tool is showing its age: https://gfblip.appspot.com/ but as you walk around you'll probably notice that you aren't getting any packets through to the Internet at all. That's why your video drops out, you aren't connected to the Internet anymore.

What Zoom does is notice this and play the video frames in its buffer back more slowly in an attempt to smooth over the "no data coming from the Internet" period. I don't find this that helpful because if you comment on something someone just said, they've moved on to a new topic by the time Zoom unbuffers and displays all the frames. Clever hack, I prefer the complete dropout.

clbrmbr · on Jan 16, 2024

I wrote such a tool[1] for my own use some years ago. It uses libpcap to grab all the packets on one interface and then re-emit them from another interface after some delay, with optional packet loss.

If there’s real interest I could dust it off… maybe rewrite in Rust.

[1] https://github.com/chrismerck/lagsim

ahoka · on Jan 16, 2024

What do you think TCP should do when bandwidth lowers?

oneplane · on Jan 16, 2024

GNS3 can do that

TheGRS · on Jan 17, 2024

This seems like a decent thread to ask: how do you all throttle UDP traffic? We are doing some WebRTC work and would like to easily do this, but it seems like the only effective way to do it is with a dedicated hardware device or through your router. I was disappointed to find out that most network limiters only throttle TCP traffic.

dmarinus · on Jan 16, 2024

years back I've used linux netem for such a purpose, back then it was already fabulous

anthk · on Jan 16, 2024

Not latency, but tritty can simulate a 56k/ISDN-like connection speed.

qainsights · on Jan 16, 2024

Great utility, will test this today.