Hacker News new | past | comments | ask | show | jobs | submit login
Rolling your own CDN for $25 in 1 hour (scalescale.com)
324 points by mxpxrocks10 on July 27, 2014 | hide | past | favorite | 89 comments



Nice. For tutorials like this it would be nice to see more details on the "how" of each step. For example:

"We setup DNS info to point here (both NS1 and NS2..."

That's great, but it would be nice to see:

"Using the geodns web-based configuration tool, we setup DNS info to point here (both NS1 and NS2..."

or

"Running <insert full command line here including tool and all options used here> on <insert environment here>, we setup DNS info to point here (both NS1 and NS2..."

or

"By editing /etc/<file> on the <some environment>, we setup DNS info to point here (both NS1 and NS2..."

In other words specifics are always really great to include.

But still this was a really great writeup, not just for the overview it gives of one lightweight approach but for the comments it has kicked off.


I think Genius and DigitalOcean = a match made in heaven. The quality would go from "the best" to "untouchable", if the comments were value adding. Imagine being able to dive deeper into any piece of the tutorial you wanted.


not to sound like a n00b, but what is Genius in your context?


I think he means Genius the company (formerly RapGenius) which annotates the web. (http://genius.com/)


oh ok thanks, I thought that, but I was also thinking maybe it was some tech stack like a docker container for a moment.


I was planning to publish a post with literally the same title as this one later this week (annoying coincidence as I guess mine will now be largely ignored - maybe I should wait a few weeks to let this space calm down, I'm not sure) with some in-depth step-by-step explanation - so watch out for that :)


How would I go about getting a link for the site you are going to publish article to?

This article title piqued my interest but in the end it lack some key pieces of info I would need to try.


It'll be on my blog, reinterpretcast.com and I'll probably tweet about it from @_joesavage. I'll probably also post to HN and Reddit, but I'm not sure if it'll gain much traction since it's so similar in concept to this article here.


It will, if its a detailed guide. This article, is an overview.


As expected (and possibly not aided by my timing, I'm unsure), my submission hasn't really gotten much attention - the link is here if you're interested though: http://www.reinterpretcast.com/roll-your-own-cdn


This comment is a model of quality feedback. It shows appreciation, not just shallow bookending, and gives detailed improvement examples. I know this is off topic a bit, sorry.


> Step 1: Order a DNS instance from Digital Ocean.

This also feels lacking to me. How does one order a DNS instance from DO? I can tell from the icon that it has something to do with CentOS but I don't see any sort of DNS server under their available applications or images for a one-click install.


great points. happy to take this approach on future posts and would love to collaborate in the future.


I really liked your write-up. As natch mentioned, it will be great to include more details in the article. I like the detail level of Digital Ocean articles.

It seems based on my recent trial experience with CloudFlare that both CDN startups might have this issue of incomplete/insufficient instructions and information for their users/customers to properly implement the solution and lot of assumptions about the skill level of users implementing the solution.

Actually, I landed up providing detailed instructions to CloudFlare on configuration and testing of their service. And a list of questions, answers to which should be readily available if CF is targeting enterprise customers.


thanks for the feedback. appreciate it.


I'd recommend CloudFlare if you want AnyCast geo-DNS, and a free CDN. We don't charge for bandwidth or our DNS service.

Disclosure: yes, I work at CloudFlare but the previous details are simply facts.


Hi, I heard that CloudFlare is now blocked in China, do you have more information?


CloudFlare is not blocked in China. A handful of IPs could be potentially, but if you're seeing something indicating that I'd recommend contacting our support team.

The GFC (Great Firewall of China) also changes daily (or sometimes hourly) so it's entirely possible whatever may have been an issue at one point in time has already been addressed.


Do you have a link to CloudFlare geo-DNS? I have multiple servers in multiple parts of the world, and didn't know that Cloudflare supported that.


CloudFlare uses AnyCast. We'll route any incoming requests to the nearest data center automatically. http://blog.cloudflare.com/a-brief-anycast-primer No additional configuration required.

If you're referring to making different origin calls based on the geographic region -- such as when we go back to your origin for a request hitting our LAX data center if you happen to have an origin near LAX we'd use that origin specifically versus your other origin which might be located in let's say LHR. We currently only support that functionality for our higher end enterprise contracts. The functionality will likely to be available for other plan levels in the future though.


anycast does not route to the geographically nearest datacenter but the topologically nearest. anycast is great, but it isn't magic.

I get it: you have marketing folks that probably specifically told you not to delve into the details but let's face it..."anycast routes it to the nearest datacenter automatically!" isn't completely true.


Hey, this brings up a good point for some more clarification.

There are two common ways CDNs do routing:

method 1) by using a customized dns server (like the one I used in the example) that responds differently to the resolvers end users use. You can use something like Maxmind (like I did in the example) to determine where you think that resolver is.

method 2) you can use anycast all the way to the TCP level for terminating traffic to ports 80 and 443. This is the most common way we do it at MaxCDN.com. We find it to be the fastest.

To do method 1 you can use the Go GeoDNS server from github (and there's one in PERL) or you can use a service like Dyn.com, DNSMadeEasy (only has broad geo), NSone.net, Verisign's DNS product or Cedexis (which can also incorporate latency data). One thing to keep in mind with doing DNS based routing is eDns for the big public DNS providers. The downside is that you can only respond by the resolver or edns subnet someone uses which can lead to a lot of inaccuracies. The big pro about method 1 is it's easy to deploy and it's easy to balance traffic if a PoP gets over loaded since it's just DNS records.

Anycast (Method 2) is great and fast, but it is high maintenance. We've spent a lot of time tweaking this over the past 4 years at MaxCDN. I've heard you can announce anycast blocks with ServerCenteral and Internap, but I haven't tried it. I've done it with Softlayer for testing. You need to have your own /24 to do it with them. We have our own infrastructure and several upstream providers. The biggest challenges with anycast are dealing with Asia (to make sure routes don't trombone) and traffic management since you can't control where traffic goes.

One of our guys did a blog post on this a while ago: http://blog.maxcdn.com/anycast-ip-routing-used-maxcdn/

Hope this is useful for some!


I love MaxCDN, keep up the good work guys!


"anycast does not route to the geographically nearest datacenter but the topologically nearest."

That is correct, and as you point out in a later comment -- in roughly perhaps 85-90% the geographic route and the topological route will be identical and ideal. In the other 10-15% cases some additional network engineering can be done to work with a problem ISP to correct the less than ideal routing for a customer.


Good point. In my testing of CF, I couldn't get my requests route to nearest datacenter though CF status page indicated green for datacenters near my request origin. It always routed me to SJC irrespective of which location the request originated from in US.


Something else was going on then. Did you have a ticket open with our support team by chance?


Surely topologically closest is most often what you want anyway?


sure, 80-90% of the time you get exactly what you want. but depending on your application and use-case, 10% can be a deal breaker of sorts.

not that Geo location from notoriously inaccurate whois data is any better.


yeah, if you're only controlling one side of the conversation you just have to deal with the limitations. We use a lot of data from RUM measurements to further tune things.

That being said, if you control both sides you can do some better stuff (ie: Aspera, Netflix clients, etc) you can really ensure accuracy. Please feel free to post any good links to this stuff.


before you gleefully skip into the panacea geo-DNS has been marketed as, please take a quick skim of this 2009 paper[1] from microsoft research about mobile clients and geolocation.

has it gotten better in five years? is an exercise left for the reader :)

[1] http://research.microsoft.com/en-us/um/people/maheshba/paper...


That seems to be of more concern for delivery of localized content than optimizing for closest data centre.


This geo-DNS is related to finding Cloudflare's servers, not finding your servers. I love and use Cloudflare but their support for Route 53 like multi-host configurations (health checks, latency, etc) is pretty bad (I think they only support roundrobin cnames).


rolling your CDN on VPS instances isn't exactly cost-effective, and is probably more of an exercise for fun than anything else. Unless you need features out of your CDN that you can't get from existing CDN providers, I don't really see the point.

also kind of unusual that this blog post is by the president of MaxCDN.


kind of unusual that this blog post is by the president of MaxCDN

It makes a lot of sense for the president of MaxCDN to make a blog post like this. Bob the developer decides to learn about how CDN's work and happens upon this blog post. Cool, Bob just built his own mini CDN network for fun. Now Bob understands that his CDN, though a fun weekend project, is definitely not production ready. What does Bob do when he needs a CDN he can put into production? He goes to the guys that established themselves as authorities on the topic: MaxCDN.

Content Marketing 101: http://www.kalzumeus.com/2012/02/09/why-i-dont-host-my-own-b...


Exactly this. I'm likely to need a CDN in the near-to-distant future. MaxCDN is now on my list because of this post


Actually, with DigitalOcean, you get quite a lot of bandwidth for the price. Additionally, their transfer limits are high (1TB for the $5 instances), and overage is only $0.02/GB. If you combine this with something like Route 53's anycast latency-routed DNS, it can work very well.


with DigitalOcean they ask you do not run a CDN: https://www.digitalocean.com/community/questions/do-you-have...


That question is 2 years old, and refers to an "unlimited bandwidth" fair-use policy. DO does not have unmetered bandwidth, so is it still relevant?

Their current policy FAQ says they do not even explicitly forbid Tor exit nodes, although there is a stern warning, so why would they disallow a proxy configuration dedicated solely to your site?


I'm assuming that's a paid CDN with clients like CloudFlare or Akamai, not you distributing your website to multiple datacenters in order to serve more clients faster. If you happen to use CDN like DNS features to route the client to the closest Digital Ocean datacenter, you're probably in the clear.


Good point. I wonder if they meant "we ask you don't run a CDN _service_" rather than just a CDN for your website.

Afterall, what is the difference between serving assets out of your webserver


Maybe it can work. I'm still a little skeptical. But I don't see why you would ever want to deal with managing all this infrastructure yourself when there are many established providers that can likely do a much better job at a comparable price.


Managing the infrastructure is definitely where it falls short, but if you can get it done quickly it can be very cost-efficient for smaller projects. Three $5 DO boxes can easily handle > 1TB per month, which will cost over $100 on most CDNs.


Couldn't you just use cloudflare's free tier?


CloudFlare's hit rate is predictably terrible for large binary files. We actually use DigitalOcean + Varnish + S3 for serving downloads from mods.io, since the files are relatively big and the site has such a low margin. Traditional CDNs (we were using Fastly) are not scalable cost-wise for that particular site.


Managing the infrastructures is quite easy with Ansible, Puppet, or whatever your flavor of config management is. You can also take advantage of anycast DNS and GEO targeting by signing up for a free NSOne.net account. That would probably shave $5 off the total.


If you have realtively low usage, Amazon CloudFront is already pretty cheap.


If you have quite low usage yes, and if you have quite high usage you probably won't be running a CDN on DigitalOcean anyway, but there's an in-between point where CloudFront is pretty expensive (by about 10x over the roll-your-own solution, depending on some assumptions about traffic).

If you build a smallish network of, say, 10 of the $10/mo, 2-TB-traffic DO servers, and can manage to spread the load such that you use an average of half of the free allocation, you pay $100/mo for 10 TB. Cloudfront would charge you $1200/mo for that data transfer. Obviously it's not the same level of reliability or ease of scaling, but it's cheaper enough to be vaguely intriguing—$1200/yr versus $14k/yr is non-peanuts!


hey. thanks for the feedback. Yeah, it's more of an exercise for fun but it can also be done w/ high bandwidth dedicated servers as well. It's good as well if it's heavily dynamic stuff that needs really low latency and you need to execute something custom on the edge. Otherwise MaxCDN.com or a similar service scales better. I'm looking forward to building more custom stuff into our edges at MaxCDN.com so feel free to throw out any ideas. This was used to create a presentation for a docker meetup.


I'm more than a little bit concerned that the article doesn't point out the serious risk of running a single DNS server for the records.

I get that they're trying to keep things simple for their demo, but unsuspecting folks following this guide are in for quite a shock when their GeoDNS is taken out by a single physical server fault.


Good point - it seemed obvious. We'll cover different options for that in a future blog post.


added this to the article in the "Todo" section. thanks for the feedback.


In similar efforts, I also really enjoyed the slides from talk at DNS OARC 2014 about Anycast on a shoe string:

http://www.slideshare.net/natmorris/anycast-on-a-shoe-string


This is similar to the process I used to build http://gcdn.org/ which I and a few clients + friends use on multiple projects. We used to use RR but GeoDNS has proven to be awesome and reliable. Full NS control makes it easier to do rolling upgrades or maintenance too. Excellent write up.

More notes on GeoDNS http://edoceo.com/howto/geodns


This returns a 404-not found for me:

https://gcdn.org/jquery/1.8.0/jquery.js


I would highly recommend Varnish using S3 as a backend origin. Add to that a few 100TB.com dedicated server and you have an extremely cheap CDN with pretty decent bandwidth and minimal maintenance.

CDNs have the advantage that they control more of the stack so can do more precise routing and have more edge nodes in more places. However, under the right circumstances, you can take the above quite far before a real CDN becomes necessary.


100TB.com specifically forbids using their service to create a CDN in their ToS[0].

    9. Acceptable Use/Illegal Activity

    d. We strive to maintain a high level of service, and a lot of
    customers depend on our high standards of quality. As such, we
    will not provide Services to those that are using our Services
    for:
    
    vii Using the Services for a content delivery network or content
    distribution network (CDN). An authorized CDN network offered
    through 100TB is accepted. Special requests to use the Services
    to run an unauthorized CDN network may be approved on a case-
    by-case basis. Failure to comply with this policy will result
    in termination of this TOS, and you will not receive a refund
    of the Fees.
[0]: http://www.100tb.com/tos.php


There's a fundamental difference between reselling CDN services, and running a caching web proxy as part of your own infrastructure for your own use.


people forget the story of simplecdn


Just looked this up. Thanks for the tip!


hey Daniel- thanks for the comment. Do you run any special varnish config settings w/ s3?


Nothing fancy right now. It's all static files. I think if I were doing caching of a dynamic application, some special settings would be far more valuable.


Now with libcloud (http://libcloud.readthedocs.org/en/latest/compute/drivers/) you can use multiple platforms in order to get all continents represented.

Including Africa (http://kili.io) where I'm at.


has anyone ever heard about these guys in South Africa: http://www.teraco.co.za/? Any other recommendations for Africa and South Africa?


awesome. Thanks for posting. I had that as one of my todo's in the blog post. Feel free to reach out to me if you want to help with a follow up post!


Why Africa and not Kenya?


I'm wondering if it would be useful to plug groupcache into this. It's what Google uses for serving downloads (e.g., Chrome downloads). Inherently distributed and self-balancing. I might give that a try.


Looks promising, but might need to add Doozer to the stack as groupcache has an issue with maintaining a list of peers where the cached data is distributed. Definitely worth looking into though.

https://github.com/ha/doozerd


great idea.


Reminds me of this article about hosting a CDN on digitalocean for the dash documentation software.

http://blog.kapeli.com/a-poor-mans-cdn


cool - thanks


Definitely nice to see a real-world example of how to use Docker.


woot woot.


Cool article for understanding the basics of a CDN. What sort of page load improvement can one get by adding geographically distributed servers?


it can be quite a lot. you can test using a CDN or a setup like this and the tools over at www.webpagetest.org (from Patrick Meehan - who is awesome) from different locations.


Wouldn't it be better on Linode? Since they have wider selection of PoP. But i wonder if the port speed would be a problem.


Thanks for sharing. Note that there's a typo in the link to Ewan's github profile (should start with https, not ttps).


thanks. will get it fixed shortly. this wasn't really scheduled for prime time but started taking off. appreciate the feedback!


"This wasn't really scheduled for prime time, but started taking off"

:) You submitted the link to HN and that comment makes it seem like the post accidentally took off!


itchy trigger finger :-) didn't expect people to respond so well on a Sunday.


Just messing! :) Cheers


Also wanted to note the post doesn't have enough Ron Smooth


got this fixed. thanks for the heads up.


Why is this page blocked by my UK work network?


The people doing the blocking can probably better answer that.


or you know, if your concern is cost, go with cdn.net and pay as you go for a kickass global cdn network

full disclosure, I have no business or personal connection to them


There's 6TB of bandwidth built into this home rolled solution. That's going to cost at least $350 at cdn.net.


you're comparing apples to oranges


Excellent tut!


that's awesome +Chris




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: