Investigating the impact of HTTP3 on network latency for search

chrisweekly · on May 22, 2023

Related tangent: highest possible recommendation for https://hpbn.co/ (High-Performance Browser Networking), an extraordinarily helpful book and website.

pksohn · on May 22, 2023

I second this recommendation, but note that this book predates and does not cover HTTP3/QUIC.

chrisweekly · on May 22, 2023

Good disclaimer. HPBN is still recommended reading (almost prereq?) before tackling H3/QUIC, though.

mablopoule · on May 22, 2023

Didn't knew about this book, it looks extremely interesting, thank you :).

why_only_15 · on May 22, 2023

This is a nice post, but it feels a little weird because while HTTP/3 gets them 10ms on average, they could get 30ms on average by reducing Asia latencies (15% of traffic, 300ms) to North America latencies (100ms). I'm sure they have some internal constraint that makes that difficult, but just stuck out at me.

supriyo-biswas · on May 22, 2023

That might require them to put replicas of the search service in Asia, and operating a distributed system over distant geographies is a hard operational problem.

jefftk · on May 22, 2023

Not necessarily: they could terminate the initial connection close to the user and then run a new connection. While this sounds like it shouldn't help, because the second connection is fully under their control they can make it highly reliable and very fast. Additionally, any recovery from dropped packets (on the user's home network) only has the latency of an in-region connection.

This also sets them up for ruining a light cache in-region, and potentially scaling that up to handle more kinds of API calls.

hawk_ · on May 22, 2023

Is this another way to say put a regional proxy or do you mean something more?

jefftk · on May 22, 2023

A regional proxy should be the same thing, yes. The important thing is that it's not a replica.

Neil44 · on May 22, 2023

Yes I'm sure they've considered doing that as it seems so obvious. I would also think that the use of any one account was very region specific so could you actually base their whole data out near them. Be interesting to hear how they've ended up at the compromise that they have.

jupp0r · on May 22, 2023

Or they could place the to-be-searched data closer to their users and get rid of traffic to North America entirely for most users.

londons_explore · on May 22, 2023

It would be nice to see that 99th percentile, the 99.9th percentile, and the 99.99th percentile.

My browser has made over 100,000 network requests so far today, and I expect yours has too. I'm sure at least some of those will be in the 99.99th percentile.

giuliomagnifico · on May 22, 2023

Very interesting reading, thanks. I just wonder why this geographic differences from one region to another.

jupp0r · on May 22, 2023

Speed of light. It takes time for a packet to reach North America from Asia.

DamonHD · on May 22, 2023

RTT to the US servers will be most of it, IMHO.

jupp0r · on May 22, 2023

Note that some of the users on the worst networks are also users that won't be able to use HTTP3 due to corporate firewall restrictions that block UDP and downgrade TLS.

Source: I've worked on network protocols for video conferencing and have seen my share of corporate network horrors.

cbsmith · on May 22, 2023

Hopefully, as HTTP3 picks up steam, this will stop happening.

But yeah, there's no amount of innovation that can't be undone by a motivated adversary. ;-)

Wagthesam · on May 22, 2023

One thing I'm wondering about is that http3 seems to be from end user to search, which is a general serving problem and not really about search. Do dropbox clients call the search service directly?

Wagthesam · on May 22, 2023

I have wondered recently about head of line blocking's impact to latency at AZ level within AWS. How reliable are TCP connections within the same datacenter and do these issues noticeably impact performance at the tail end?

diroussel · on May 22, 2023

Maybe there are some answers to that here

https://aws.amazon.com/blogs/hpc/in-the-search-for-performan...

Where AWS explain why TVP isn’t good enough for them and why they are developing a replacement.

endorphine · on May 22, 2023

It would be interesting to see similar experiments published from big edge networks like Cloudflare and Fastly. Do they have something similar published?

rimtutituki · on May 22, 2023

Just wandering, what http3 solutions are on Linux side. Can IPVS be http3 compatible at all?

jeroenhd · on May 22, 2023

Why not? As long as you can send and receive UDP traffic, there's no reason why you can't do http/3.

Caddy and HAProxy both support the protocol. nginx doesn't have it in the public stable version yet, though there are preview packages available for modern server platforms.

bbss · on May 22, 2023

HTTP/3/QUIC supports migrating connections between two networks, such as if a user switches from WIFI to LTE. IPVS or any UDP load balancer won't handle this scenario properly since it doesn't introspect the QUIC header and load balance based on the QUIC connection ID. This QUIC connection ID allows for a stable connection when the device needs to switch networks. If operators have any sort load balancer (like IPVS) between the client and the point the HTTP/3 connection is terminated, they will need to ensure that it has proper support for QUIC. One example is Katran[1] which has support for this method of load balancing.

[1] https://github.com/facebookincubator/katran

rimtutituki · on May 22, 2023

Any other L4 OSS around with support for HTTP/3 except Katran?

I've tried to use it but it was pain :).

bbss · on May 22, 2023

Not that I am aware of.

petecooper · on May 22, 2023

>nginx doesn't have it in the public stable version yet

It's near, roadmapped for 1.25:

https://trac.nginx.org/nginx/roadmap

rimtutituki · on May 22, 2023

Yeah. Should test it, would be interesting to see if DR is working with UDP balancing (via IPVS).

clementmas · on May 22, 2023

Can we use HTTP3 today? Is it widely supported?

unmole · on May 22, 2023

Yes. Google and Facebook have deployed HTTP/3 across their properties for quite some time now. Support in popular servers is not mature but if you're using a CDN like Cloudflare, it's trivial to enable HTTP/3 for your users.

petecooper · on May 22, 2023

Nginx 1.25 has HTTP/3 in its roadmap, first cut from that branch is scheduled to land tomorrow:

https://trac.nginx.org/nginx/roadmap

BucketsMcG · on May 22, 2023

We use it on our rinky-dink ecommerce site, and it represents 40% of requests at the moment. HTTP/1 and HTTP/2 are at 30% each. Our users skew a bit older, and definitely aren't techno-savvy (one scrolled past all the products and used the contact us page to ask for a printed product catalogue and order form...).

esprehn · on May 22, 2023

HTTP/1 is surprising, what browser is that coming from?

BucketsMcG · on May 22, 2023

Can't tell, sorry, the cache service provider doesn't supply that info.

noway421 · on May 22, 2023

Wondering the same- how’s the compatibility with very restrictive networks? Say a university/corporate network that only allows port 443 and DPI’d 80 egress. QUIC won’t pass through that since it’s UDP. Can server and client reliably negotiate a fallback to http/2 in such a case?

supriyo-biswas · on May 22, 2023

Yes. Browsers typically initiate a TLS (TCP) connection to port 443 for HTTP1/2, and then upgrade to HTTP/3 based on the Alt-Svc header.

fridek · on May 22, 2023

This. There are also clients that with a little config will let cache the support level per-host, and even provide a list of hosts that the initial request should race TCP and QUIC to.

https://developer.android.com/guide/topics/connectivity/cron...

noway421 · on May 22, 2023

What if your browser receives Alt-Svc header and switches to http/3 on one network (say mobile data), but then you switch to a restrictive WiFi that has UDP disabled. All without restarting your browser/within one "session" of your http client. Wouldn't you start having connectivity issues that would be hard to troubleshoot? In that scenario, having http3 disabled is beneficial.

fridek · on May 22, 2023

Changing network interfaces breaks connections and causes a new handshake. Browser session works at a different layer and doesn't prevent that.

QUIC actually lets you migrate between connections (because the packets are identified by a connection ID in each UDP packet rather than a 5-tuple). Clients will typically re-test a connection occasionally and downgrade as needed for this to work.

oefrha · on May 22, 2023

HTTP/1.1 isn’t going anywhere in any popular server, so yeah, HTTP/2 fallback isn’t a problem. (HTTP/1.1 is basically required even if you just want to serve an https redirect on port 80, since h2 requires TLS. The clear text h2c protocol has no adoption AFAICT.)

ignoramous · on May 22, 2023

Yep, just like IPv4 didn't, HTTP/1 and H2 aren't going anywhere anytime soon.

Ekaros · on May 22, 2023

HTTP2 is probably first one to go or be replaced. You can live without it or update it to whatever next thing.

j16sdiz · on May 22, 2023

nit: It is HTTP/1.1 . HTTP/1.0 have long gone

Twirrim · on May 22, 2023

Sadly, 1.0 is _still_ around. Rare, but it happens.

I remember being gobsmacked several years ago that F5 LBs only supported HTTP/1.0 health checks. Doing HTTP/1.1 health checks required writing one specifically for it. Something the community had got sorted a long time ago.

BenjiWiebe · on May 22, 2023

I'm pretty sure I saw HTTP/1.0 in a PLC's status page recently.

wraptile · on May 22, 2023

IPv4 is not going away because it's valuable (as in limited supply creates a market). HTTP 1.1 is already unusuable in modern web as you'll be instantly blocked by cloudflare and gang. HTTP2 is likely to follow in near future as replacing it will be much easier than replacing http1.1.

re-thc · on May 22, 2023

Adoption was also slow because there were lots of legacy switches and routers that did not support it. Upgrades took a long time.

bheadmaster · on May 22, 2023

> IPv4 is not going away because it's valuable (as in limited supply creates a market).

That's what I've been telling my bros about crypto.

wraptile · on May 22, 2023

I'm not sure you understand. IP scarcity means they are more valuable in the web automation context. IPv6 availability will make it harder to "price" web automation -> more bots online -> more captchas and privacy invasions. That's a real challenge that is hard to solve despite your snarky comments.

bheadmaster · on May 22, 2023

The IPv4 address scarcity means that bots cause a lot more collateral damage, because if your neighbor runs a bot from home, you'll most likely get banned too.

Yeah, there's value in convenience of being able to block misbehaving IPv4 addresses. There's also value in burning CPU cycles to produce a transaction on a blockchain.

Make of it what you will.

agildehaus · on May 22, 2023

It remains missing from Safari. It used to be offered as an experimental feature, but I don't seem to have it anymore on Safari 16.5.

JimDabell · on May 22, 2023

Safari added support for HTTP/3 in version 14, released in September 2020. That’s why it isn’t listed as an experimental feature any more.

https://developer.apple.com/documentation/safari-release-not...

You can check with Cloudflare’s test page here:

https://cloudflare-quic.com

Alternatively, open up developer tools, go to the network panel, right-click on the headings and enable the protocol field. It will tell you which protocol was used for each request made by your browser. Obviously the server also needs to support HTTP/3 for a request to use HTTP/3.

agildehaus · on May 26, 2023

Another user here commented that it's been enabled randomly for a subset of users. But not for me. And there's nothing on the web I've found about how to enable it. It was once on the Experimental menu, but no longer.

shpx · on May 22, 2023

It is supported and randomly enabled for ~50% of Safari 16 users, according to https://github.com/Fyrd/caniuse/pull/6664#issuecomment-14934...

jeroenhd · on May 22, 2023

You can enable http/3 just fine even if you also serve clients that are falling behind on web standards, as long as you also serve http/2 as a fallback.

Safari not supporting isn't a problem unless your business only and specifically targets Safari, and even then it's a relatively painless way to help the few customers who run preview versions get a nicer experience.

paulddraper · on May 22, 2023

IIRC Chrome has whitelisted some domains, but requires a flag to enable it generally.

fridek · on May 22, 2023

QUIC is enabled by default for all domains across most major browsers: https://caniuse.com/http3

For Cloudflare HTTP/3 accounts for 28% of the traffic (https://radar.cloudflare.com/)

asymptotic · on May 22, 2023

While HTTP3 does provide some improvements, it's clear that further optimizations are needed, particularly for users in Europe and Asia.

One potential solution that hasn't been discussed much here is leveraging AWS managed services like AWS CloudFront or AWS Global Accelerator. It's worth noting that Dropbox's website already uses AWS CloudFront, so they are already leveraging AWS in some capacity. Based on my cost calculations, using AWS CloudFront would cost around $40k a month, while AWS Global Accelerator would be around $22k a month.

As of August 22, 2022, AWS CloudFront supports terminating HTTP/3 in a Point of Presence (POP), which could potentially help with the latency issues Dropbox is facing. AWS Global Accelerator, on the other hand, is designed to improve the performance of applications by terminating UDP/TCP as close to users as possible then routing user traffic through the AWS global network infrastructure. This could help reduce latency by ensuring that user traffic is routed through the most optimal path, even if the user is located far from a Dropbox data center.

It's hard to estimate what the potential latency reduction of using e.g. AWS Global Accelerator is, especially at higher percentiles. However, using https://speedtest.globalaccelerator.aws/, and assuming symmetry, my connections to Asia are 35-40% lower latency.

Of course, there are trade-offs to consider when using managed services like AWS CloudFront and AWS Global Accelerator. While they can provide significant performance improvements, they also come with additional costs and potential vendor lock-in. However, given the scale of Dropbox's operations and the importance of providing a fast, reliable search experience for their users, it may be worth exploring these options further.

---

Cost estimates

Assumptions:

    1. Dropbox's peak traffic is 1,500 queries per second (QPS).
    2. Average data transfer per query is 100 KB.
    3. 50% of the traffic comes from North America, 25% from Europe, and 15% from Asia (remaining 10% from other regions, for pricing purposes put it into Asia's calculations).

---

1) AWS CloudFront cost estimation:

    Data transfer:
    - North America: 1,500 QPS * 0.5 * 100 KB * 60 seconds * 60 minutes * 24 hours * 30 days = 194.4 TB  
    - Europe: 1,500 QPS * 0.25 * 100 KB * 60 seconds * 60 minutes * 24 hours * 30 days = 97.2 TB  
    - Asia: 1,500 QPS * 0.25 * 100 KB * 60 seconds * 60 minutes * 24 hours * 30 days = 97.2 TB

    Data transfer cost:
    - North America: 194.4 TB * $0.085/GB = $16,524
    - Europe: 97.2 TB * $0.085/GB = $8,262
    - Asia: 97.2 TB * $0.120/GB = $11,664

    Total data transfer cost: $16,524 + $8,262 + $7,006 = $36,200

    HTTP requests:
    Total requests: 1,500 QPS * 60 seconds * 60 minutes * 24 hours * 30 days = 3,888,000,000

    Using the updated AWS CloudFront pricing (as of May 22, 2023):
    - HTTP requests cost: 3,888,000,000 * $0.0075/10,000 = $2,916

Total estimated monthly cost for AWS CloudFront: $36,200 (data transfer) + $2,916 (HTTP requests) = $40k

---

2) AWS Global Accelerator cost estimation:

    Data transfer:
    - Total data transfer: 194.4 TB (NA) + 97.2 TB (EU) + 97.2 TB (Asia) = 388.8 TB

    Using the updated AWS Global Accelerator pricing (as of May 22, 2023):
    - Data transfer cost (averaged across regions): 388.8 TB * $0.035/GB = $13,608
    - (Also need to add EC2 egress cost, 388.8 TB * $0.02/GB = $7,776
    - Total data transfer cost = $21,384

    Accelerator:
    - Assuming 1 accelerator with 2 endpoints (1 for HTTP/2 and 1 for HTTP/3)
    - Accelerator cost: 1 * $18/accelerator/day * 30 days = $540

Total estimated monthly cost for AWS Global Accelerator: $21,384 (data transfer) + $540 (accelerator) = $22k

blerb795 · on May 23, 2023

Former Dropbox employee, just correcting one assumption:

> Dropbox's peak traffic is 1,500 queries per second (QPS).

I can't speak to search QPS directly, but most individual serving hosts for file sync/retrieval were receiving tens of thousands of QPS. The overall edge QPS peaked at several hundreds of thousands QPS every day across all the hosts. So I'd guess that even just search is an order of magnitude higher than 1,500 :)

asymptotic · on May 24, 2023

I misread the article when it stated:

> Traffic regularly exceeded 1,500 queries per second (QPS) at peak times

It may be quite cost prohibitive to use a managed service like AWS Global Accelerator.

abujazar · on May 22, 2023

tl;dr; HTTP3 provides a little improvement, but optimizing search speed is much more important in this case.

Thiez · on May 22, 2023

Although HTTP3 was a significant win for people with lossy connections and high latency.

abwizz · on May 22, 2023

like mobile

x-complexity · on May 22, 2023

> tl;dr; HTTP3 provides a little improvement, but optimizing search speed is much more important in this case.

The article itself shows otherwise:

https://dropbox.tech/frontend/investigating-the-impact-of-ht...

------------------------------------------------------------------------------------

|p25 | -3.20ms / -6% | -2.34ms / -2% | -3.73ms / -2% |

|p50 | -4.21ms / -5% | -3.84ms / -3% | -5.12ms / -2% |

|p75 | -9.03ms / -8% | -11.1ms / -6% | -15.0ms / -4% |

|p90 | -44.9ms / -17% | -47.3ms / -13% | -77.3ms / -14% |

|p95 | -118ms / -22% | -141ms / -21% | -200ms / -22% |

------------------------------------------------------------------------------------

While the average case was negligible, the edge cases fared significantly better because of HTTP3's removal of the head-of-line blocking in TCP.