Building Your Own CDN for Fun and Profit

zzzcpan · on Feb 14, 2018

Fast nameservers are not as important as author suggests. But either way one extra indirection for nameservers would allow you to choose nameserver records dynamically too. And with large enough TTLs and some traffic, clients won't have to go all the way to find out the closest nameserver, essentially providing clients with the fastest one from the cache. Since redundancy is built in into DNS any nameserver with large TTL going down won't be a problem. And unlike with anycast this is much more reliable and much cheaper, since you don't have to rely on one AS and network infrastructure as a single point of failure and you don't have to even build one either. You can use as many different hosting providers as needed.

janoszen · on Feb 14, 2018

(I'm the author.) This whole setup is built for a comparatively low traffic blog, so DNS caching won't help much. (On normal days I get ~100 visitors.) This is compounded by the TTL which is 60s to account for node failures.

The optimization level is in the sub 1 second range, so not having to pay one large RTT penalty for a DNS lookup is quite important. I've measured 300+ms RTT to Australia on the previous box I was using, that impacted the load times quite severely.

nathan_f77 · on Feb 14, 2018

That's a fun project. For production websites and blogs, I'm pretty happy with Netlify, CloudFlare, and CloudFront. But CloudFront charges $600 per month for custom SSL certificates [1], so you could save a lot of money by just spinning up ~10 servers in different AWS regions.

I noticed this line at the bottom of the page: "When it comes to picking a solution, I often choose the less traveled road". I don't agree with that at all, and it sounds a bit like NIH syndrome. It's always better to choose the most-traveled roads, especially in DevOps. If there's a problem, then you can join those communities and contribute to the projects.

[1] https://aws.amazon.com/cloudfront/custom-ssl-domains/

sciurus · on Feb 14, 2018

> But CloudFront charges $600 per month for custom SSL certificates

This is misleading. Cloudfront doesn't charge anything for putting your domains on an SSL cert that uses SNI. They only change you if you need a cert without SNI, which requires them to allocate a dedicated IP address to you.

I'm hosting my personal blog on S3 and cloudfront, with SSL, for less than a dollar a month.

Performance and capabilities are fine for me, too. I get 0.15 seconds to first byte from Chicago, vs 0.24 for the author's site.

https://www.webpagetest.org/result/180214_K2_28c2826a6422b01...

https://www.webpagetest.org/result/180214_QT_a91f7af3bf78e7b...

janoszen · on Feb 14, 2018

If you are fine with having slashes at the end of your URLs and you do not want to do anything too complicated like content negotiation for image types, S3 and CloudFront is fine. The moment you turn on Lambda@Edge, to do the magic, things get slow after a period of no traffic.

I plan on expanding on the featureset, so no S3 for me. :)

gunzel · on Feb 15, 2018

Did you consider using periodic calls to keep the Lambda@Edge functions "warm"? I've been playing with Zappa (https://www.zappa.io) for standard Lambda and it sets this up by default.

janoszen · on Feb 15, 2018

Yes, but it's kind of a whack-a-mole since their reuse times are not public AFAIK, so it would constantly need tuning as they develop the service.

janoszen · on Feb 14, 2018

"When it comes to picking a solution, I often choose the less traveled road"

I forgot to add that this applies only to R&D and hobby projects, for production setups I'm a bit more careful. :)

(I'm the author.)

nathan_f77 · on Feb 14, 2018

Ah, that makes sense!

janoszen · on Feb 14, 2018

Thank you for pointing that out, I've updated my bio to reflect that. Hopefully this way it's a little less ambiguous. :)

janoszen · on Feb 14, 2018

I think SNI is fine, all modern browsers seem to support it: https://caniuse.com/#search=sni

koolba · on Feb 14, 2018

If you can get by with only SNI connections then you don’t have to pay the $600 per month. The $600 is for a dedicated IP that will serve a single certificate.

lostcolony · on Feb 14, 2018

'It's always better to choose the most-traveled roads, especially in DevOps'

Maybe. I agree it's not as clearcut as always pick the less traveled road, but the difference between the two may include a competitive advantage that you'd be unwise to overlook. I mean, you're on HN; Paul Graham and Common Lisp back in the 90s is an excellent example.

manigandham · on Feb 14, 2018

That price is only for non-SNI browser and devices, which is probably less than 1% of devices/browsers now [1]. Otherwise Cloudfront supports free custom certificates and you can use Amazon's Certificate Manager to acquire and renew them automatically, also for free.

https://www.digicert.com/ssl-support/apache-secure-multiple-...

jgrahamc · on Feb 14, 2018

$600/month for a custom SSL certificate is totally ridiculous. Cloudflare Pro ($20/month) includes a certificate that works with non-SNI browsers.

andrewstuart2 · on Feb 14, 2018

The $600/mo figure has to be considered against the type of workload your servers run, and how variable your traffic patterns are.

If you need to handle bursty traffic, you're likely going to get best value in shared tenancy services until you can fully utilize your servers. Otherwise, you will probably end up paying for idle infrastructure.

manigandham · on Feb 14, 2018

It has nothing to do with workload, that's the price for a dedicated IP to serve a TLS certificate to serve the (rapidly diminishing number of) non-SNI capable browsers and devices.

homero · on Feb 14, 2018

I use https://www.keycdn.com which has free letsencrypt

davidu · on Feb 14, 2018

"Second, BGP routes are not that stable."

This has been disproved for close to ten years empirically and academically[1]. Route flaps generally result in convergence to the exact same destination if it has another path and is still online. If it's offline, then it's working as intended, and that's no different from a server being rotated via a DNS pool going down.

1: Quick search: https://www.google.com/search?q=tcp+anycast+paper&ie=utf-8&o...

jgrahamc · on Feb 14, 2018

That's not our experience either. BGP is fine.

But it is the case that transit and peering connections are not stable (in the sense of going up and down randomly or suddenly experiencing high levels of packet loss) and active monitoring is a must.

janoszen · on Feb 14, 2018

Thank you to both of you, I've edited the article to clarify that point.

davidu · on Feb 14, 2018

Is your experience that when routes reconverge, they still select the same end POP they did prior to the flap?

Assuming the end POP is still reachable along another regional route, I believe all the data I've seen shows that the client almost always hits the same destination they did before the flap.

severine · on Feb 15, 2018

What is POP?

janoszen · on Feb 15, 2018

A POP or edge location is a server (or multiple) that the user traffic is being routed to, hopefully close to the user. A CDN consists of multiple POPs, one in each region, with intelligent traffic routing added (as described in the article).

rsync · on Feb 14, 2018

I have always thought it would be a fun and inspiring project to deploy a global CDN ... my career and my lifelong hobby have both been "UNIX sysadmin" and I love running networks ...

However, I spoke about this to some ISP/NANOG folks that I trust and they said that running a real CDN is a nightmare because all of your links (providers) hate you ... you're producing the exact opposite of the traffic that they want and they will not give you any breaks or help or benefits since you are their worst customer.

How accurate was that assessment ?

janoszen · on Feb 14, 2018

It depends on the scale. Running a personal blog with sub-1MiB/s traffic is not a problem. I've seen some larger projects though where detailed data analysis had to be employed to debug bad connections... that's not a one-man-job and it was a serious headache to work around some of the less... neutral providers.

sacheendra · on Feb 14, 2018

I have also heard the same. An interesting thing to do would be to also be a commercial ISP (i.e., sell to datacenters and businesses). Now that is the traffic all the ISPs want as outgoing traffic goes to such networks.

Running a Global CDN and ISP might be a tad too ambitious.

askbjoernhansen · on Feb 15, 2018

Not at all accurate, assuming the "real CDN" is well run.

kirankn · on Feb 14, 2018

We currently use KeyCDN which works out well, both performance & money wise. You may want to try it out.

boundlessdreamz · on Feb 14, 2018

Yeah. They are pretty good and very good value for money. We went from Cloudfront -> Edgecast -> KeyCDN and each change reduced our costs. Cloudfront can become really expensive since they charge for each HTTP request in addition to bandwidth.

kingkool68 · on Feb 14, 2018

I was using KeyCDN until I discovered bunnyCDN at $0.01/GB.

lessclue · on Feb 14, 2018

Same here. Cloudfront was ridiculous. KeyCDN works great for us.

ksec · on Feb 14, 2018

Why KeyCDN ? Why not MaxCDN or Fastly etc?

vasco · on Feb 14, 2018

MaxCDN has shitty performance, terrible monitoring. We have to tell them when their servers are overloaded due to our monitoring detecting regions with super high SSL negotiation times.

boundlessdreamz · on Feb 15, 2018

MaxCDN performance is not great. Fastly charges for requests in addition to bandwidth.

mikerg87 · on Feb 14, 2018

if you have a specialized application knowing how to do this can be quite useful. CDN pops are almost not existent across much of the Middle East and Africa. Sometimes building your own is the only way until a commercial offering becomes available.

kijin · on Feb 14, 2018

Also, PoPs in some regions are often nearly useless even if they exist on paper.

For example, Cloudflare has a PoP in Seoul, but it has such limited bandwidth that most sites using Cloudflare are routed to Tokyo, Hong Kong, and even Los Angeles. Several of my clients in Korea signed up for Cloudflare a few years ago when the local PoP was still usable, but now all but two of them have canceled their subscriptions. Instead, I've been building a lot of caching proxies for them lately.

If anyone is here for the Winter Olympics right now and some of your favorite sites don't seem to be living up to Korea's reputation for ultra-fast internet, Cloudflare might be one reason. (Meanwhile, Amazon's PoP in Seoul is perfectly fine, albeit expensive.)

jtl999 · on Feb 14, 2018

Cloudflare has a Vancouver PoP but Telus Vancouver doesn't use it, all traffic is routed to Seattle. (as an example)

  colo=SEA
  spdy=h2
  http=h2
  loc=CA

porker · on Feb 14, 2018

I don't understand how his use of Traefik gets round the SSL pain point?

> Using SSL/TLS certificates

> The next pain point is using SSL/TLS certificates. Actually, let’s call them what they are: x509 certificates. Each of your edge locations needs to have a valid certificate for your domain. The simple solution, of course, is to use LetsEncrypt to generate a different certificate for each, but you have to be careful. LE has a rate limit, which I ran into on one of my edge nodes. In fact, I had to take the London node down for the time being until the weekly limit expires.

> However, I am using Traefik as my proxy of choice, which supports using a distributed key-value store or even Apache Zookeeper as the backend for synchronization. While this requires a bit more engineering, it is probably a lot more stable in the long run.

janoszen · on Feb 14, 2018

Traefik can simply request certificates using the DNS verification method, as opposed to the certbot HTTP verification. (HTTP would not work with a distributed setup like this.) Alternatively, Traefik can also synchronize certificate requests using one of the many key-value stores supported (untested as of yet).

The drawback of the DNS method without synchronization between the nodes is that you run into the LetsEncrypt rate limit quite easily. My expansion to ap-southeast-1 and sa-east-1 is waiting for the LE cooldown.

Disclaimer: I'm the author of the article.

kawsper · on Feb 14, 2018

I guess they can use Traefik to distribute the certificates, so instead of having each node request their own set of certificates, he can instead request them once and distribute the certificate to all the other nodes, and keep themselves under the limit set by LetsEncrypt.

forcer · on Feb 14, 2018

Author mentions why not use Cloudflare that CDN cache is purged often. If you want to verify if it happens for your content. You can try this tool - http://cloudperf.speedchecker.xyz/cloudflare-tester.html

Side effect of this tool as you might have guessed is that using it will actually prolong the time your content stays in their cache.

kardos · on Feb 14, 2018

So one could setup an automated crawler thatbruns frequently to keep everything in cache?

janoszen · on Feb 14, 2018

Yes, but you would need a crawler that does so in every region, or at least know the IPs of the edge nodes on that CDN. You would probably also hit some rate limit / DDoS protection with the CDN itself.

dzolvd · on Feb 14, 2018

https://github.com/apache/incubator-trafficcontrol is an open source cache control layer (working with ATS) that has features for header rewrites, ssl, and custom urls (among others). It is built for video but can be used to cache any content. Probably a bit heavy for your use case infrastructure wise though.

janoszen · on Feb 14, 2018

Interesting, although I specifically wanted to build a push CDN (where I can push the content) rather than a pull CDN (that works with an origin) to avoid the added latency with cache misses.

dzolvd · on Feb 14, 2018

Makes sense, I am enjoying looking through the source as we are moving to an ansible and hopefully dockerized deployment model.

janoszen · on Feb 14, 2018

Of course it's dockerized, it has to be cool, right? :)

Ansible is running docker-compose up -d on deployment an Traefik is doing the magic. I want to extend it to host multiple sites in the future. (Btw. Ansible ran from a central location is painfully slow because of the large latency to the edge nodes.)

The content itself is deployed using rsync, Ansible was just too painfully slow for that.

dx034 · on Feb 14, 2018

> Second, BGP routes are not that stable. While DNS requests only require a single packet to be sent in both directions, HTTP (web) requests require establishing a connection to download the content. If the route changes, the HTTP connection is broken.

I thought Cloudflare uses Anycast to avoid targeted DDOS? How do they handle changing routes during HTTP requests?

anonacct37 · on Feb 14, 2018

There's alot of fear around the possibilities of flapping routes, but alot of real world data seems to show it doesn't happen seem to impact web traffic that often.

People often mix and match anycast/Geo DNS and anycast/unicast http.

Some even go a step further and, for video files, anycast to a node that 302s to it's own unicast address.

scurvy · on Feb 14, 2018

Indeed. There are also right ways to setup anycasting and wrong ways.

Right way: 1-2 major Tier1 carriers across all of your PoPs with local peering for regional eyeball networks.

Wrong way: Using a different set of transit carriers at each location.

You really don't want that many AS paths to reach your content from a given location (3-4 is more than enough). What you're really going for with BGP anycasting is that your local ISP has a direct route to the closest PoP via exchange peering, or that the Tier1 path drop you off to the "closest" route. Transit carriers do this for a living, and they're usually quite good at figuring out route weighting inside their own network.

Yes, I know Netflix does it differently but they use a lot more smart geo DNS routing than anycasting.

Edit: IMHO it's also better to choose a Tier1 with a moderate sized network that values stability and performance over size. So someone like NTT over say Level3.

chatmasta · on Feb 14, 2018

Anycast means there are multiple routes going to the same destination. You get the route that is the shortest path via BGP to the anycast IP (least number of BGP hops). Once you have an established TCP session via one route, it will remain established through that route, as long as that route is still the “shortest” between your IP and the anycast IP.

The route will not “change” unless cloudflare changes their routing, or you change your location/IP so that a shorter route exists. Once you’ve changed your IP, you’ve already interrupted any TCP sessions anyway.

You might find these two blog posts from LinkedIn to be helpful:

https://engineering.linkedin.com/network-performance/tcp-ove...

https://engineering.linkedin.com/blog/2016/04/the-joy-of-any...

chousuke · on Feb 14, 2018

I think it should be clarified that "destination" refers to an IP address, not an individual host. My understanding is that anycasting means a single address corresponds to multiple hosts achieved by simply advertising it from several sources with BGP, and you will often still have multiple redundant routes to any of the individual hosts behind the anycast IP because most locations will have redundant internet links.

Depending on how the routing is set up, it doesn't matter if the route changes so long as you end up on the same host consistently (or one that can at least pretend it's the same host if you do some kind of fancy session mirroring, perhaps)

tpetry · on Feb 14, 2018

Google Cloud Global Loadbalancer seems to do this fancy session mirroring because you only have one IP for the http load balancing. I am very often impressed by the GCP products.

chousuke · on Feb 16, 2018

There are other techniques that Google's routers will most likely use to load-balance traffic transparently to multiple hosts. A relatively simple way is to hash the (source, destination) address pair of the IP packet to determine which host to forward the packet to, so it doesn't necessarily require mirroring or any state. Only seamless failover when the host fails requires the fancy tricks.

dx034 · on Feb 14, 2018

> The route will not “change” unless cloudflare changes their routing, or you change your location/IP so that a shorter route exists. Once you’ve changed your IP, you’ve already interrupted any TCP sessions anyway.

That's what I thought, too. But the article explicitly states this as a potential issue.

vbernat · on Feb 14, 2018

For a more "own" CDN, here is another write-up: https://www.linkedin.com/pulse/build-your-own-anycast-networ...

kirankn · on Feb 14, 2018

I have tried using AWS Route53's latency based records, but for some reason, it always didn't work for me. I need to check it again.

lucjac · on Feb 15, 2018

why not just set up a server that requests the website every few minutes/seconds or so? That way it would always stay in the cache

thisisit · on Feb 14, 2018

I am curious, if anyone knows how well does Akamai work in the CDN world?

scurvy · on Feb 14, 2018

They're good but only at $100k per month and above. You really need to use their full suite of products to get the full benefit, and by that time you'll be at $100k per month.

They are OK not great for smaller accounts.

photonios · on Feb 14, 2018

At work we use Akamai to serve a large website and we pay about a 10th of that and it definitely benefits us. But that's only because they simply have a large network and are one of the few CDN's that have POP's close to our customers. Other than that, it's just overkill for small businesses.

stonewhite · on Feb 14, 2018

Terrible to work with and not all that performant as well. Also they caused us a 24+ downtime by a forced configuration change from their side, which undid some configurations their professional services implemented (yeah, some parts of the UI is only modifiable by PS). Luckily we had Cloudfront integration as backup, so switched over to that until finally Akamai team decided fix our problem.

anonacct37 · on Feb 14, 2018

Don't use them unless you have to. The vendor lock-in is strong.