Agreed, my best guess it's due to a smaller MTU between the CDN and your device. They are probably replying with TLS Server Hello which would typically max a standard 1500 byte packet. It's also likely why HTTP isn't working either since they would ACK the connection, you would probably be able to issue the GET / but you would never get a response back due to the HTTP response payload being larger than a single packet.
A few ideas to test this theory:
1) Find an asset on their server that is smaller than 500-1000 bytes so the entire payload will fit in a packet. Maybe a HEAD would work?
2) Clamp your MSS on this IP to something much smaller like 500 instead of the standard 1460. This should force the server to send smaller packets and will work better in practice than changing your MTU. See: https://tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.cookbook.mtu-...
The load shifting part is similar to the way BigTable splits, merges, and assigns tablets. But the rest of it is not related, because BigTable does not try to offer mutation consistency across replicas. If you write to one replica of a BigTable, your mutation may be read at some other replica, after an undefined delay. Applications that need stronger consistency features must layer their own replication scheme atop BigTable (such as Megastore).
What this post is describing for replication seems more comparable to Spanner.
I don't understand this comment. Bigtable requires that each tablet is only assigned to one tablet server at a time, enforced in Chubby. There's no risk of inconsistent reads. Of course this means that there can be downtime when a tablet server goes down, until a replacement tablet server is ready to serve requests.
Right, the contrast I was trying to draw was between what they depict, where multiple nodes are holding a replica of the tablet and performing synchronous replication between themselves, and what BigTable would do, which is to have the entire table copied elsewhere, with mutation log shipping. What they are doing is more analogous to how Spanner would do replication.
Unless you're doing multi-cluster replication, there is no log shipping in BigTable: the data replication within a cluster is taken care of by the underlying filesystems.
HTTP/3/QUIC supports migrating connections between two networks, such as if a user switches from WIFI to LTE. IPVS or any UDP load balancer won't handle this scenario properly since it doesn't introspect the QUIC header and load balance based on the QUIC connection ID. This QUIC connection ID allows for a stable connection when the device needs to switch networks. If operators have any sort load balancer (like IPVS) between the client and the point the HTTP/3 connection is terminated, they will need to ensure that it has proper support for QUIC. One example is Katran[1] which has support for this method of load balancing.
Great post! I would have loved to see P2C (Power of 2 Choices) in there as well, which is typically a better alternative to Round Robin and Least Connections.
P2C is really cool, but it would have meant having to talk about load balancers with incomplete information. This felt like slightly too much to add to an already-quite-long post. It also would have added an extra layer of complexity to my already-quite-complex simulation code :sweat_smile:
zombiezen/go-sqlite uses cznic's pure Go converted version of SQLite, so avoids CGo. It's explicitly stated to be "a fork of crawshaw.io/sqlite that uses modernc.org/sqlite, a CGo-free SQLite package. It aims to be a mostly drop-in replacement for crawshaw.io/sqlite."
The downside with “automagically” trying to handle idempotency is users may not be aware of it and retries may happen across different processes (maybe they are running their application on k8s with multiple pods), which doesn’t work with stripes default behaviour.
IMO the idempotency key should be required to be set and make the user aware that they need to handle retries properly.
You might find differing philosophies depending on where you look, but a recurring theme you'll find with Stripe's is that they try to make it as easy as possible to get an integration up and running. When you're building out a payment integration, you're already awash in non-trivial concepts that are probably new and novel to you, so things that can be abstracted away for the time being to make things easier generally are.
In the situation you describe, I think it would make more sense to just retry the call a couple times from the same pod so you don't have the sizable overhead of discarding it and creating a new one for every failure, in which case the automatic keys would work fine. And if there's a really good reason they're not, setting the keys manually is very easy. At some point if you're far enough off the beaten path, you have to expect to read some docs.
[1]https://github.com/envoyproxy/envoy/pull/18780